bug#54786: Installation tests are failing

2022-08-09 Thread Mathieu Othacehe


Closing as all the installation tests are now fixed.

Thanks to everyone involved :)

Mathieu





bug#54786: Installation tests are failing

2022-06-10 Thread Maxim Cournoyer
Hi Ludo,

Ludovic Courtès  writes:

[...]

>> When using 'make-forkexec-constructor/container', the clone(2) call
>> happens before switching user, thus as 'root' in Shepherd, which
>> explains why it works.
>
> Damnit, that’s right.  For example the result of:
>
>(lower-object (least-authority-wrapper (file-append coreutils "/bin/uname")
>   #:namespaces (delq 'user 
> %namespaces)))
>
> won’t run as an unprivileged user:

[...]

> I think we would add #:user and #:group to ‘least-authority-wrapper’ and
> have it call setuid/setgid.  ‘make-forkexec-constructor’ doesn’t need to
> be modified, but the user simply won’t pass #:user and #:group to it.

OK!  I'll adjust the jami-service-type when we get around to implement
the above; for now I've pushed my proposed fix which still uses
'make-forkexec-constructor/container' as
85b4dabd94d53f8179f31a42046cd83fc3a352fc.

Thanks,

Maxim





bug#54786: Installation tests are failing

2022-06-07 Thread bokr
Hi,

tl;dr: I hope there will be a security team discussing
   whether/how this kind of privileged execution interval
   could be exploited, and how to prevent such.
   
   E.g., could something that stealthily gets put in a finalizer
   for some innocent object be waiting to notice that it is running
   privileged, and do the next step in a dirty-work chain that
   sets things up nicely for e.g. remote DDOS control?

   Or is the independent FLOSS development process
   and its quality control being sabotaged stealthily
   with injections of "innocent mistakes" and
   (ultimately) trivial time-wasting bugs, making FLOSS projects
   look "not ready for production use" ?
   (despite the increasing evidence to the contrary)
  
   BTW, I think a minimalist/MES/RISCV team would be interesting!

   Regards,
   Bengt Richter
   
On +2022-06-07 16:00:54 +0200, Ludovic Courtès wrote:gets\
> Hi!
> 
> Maxim Cournoyer  skribis:
> 
> > Ludovic Courtès  writes:
> 
> [...]
> 
> >>> I reviewed how that works, and it'd be easy; I just didn't see the
> >>> incentive yet (there's no composition needed for the service, and it'd
> >>> make the definition slightly less readable).  If you tell me
> >>> mark+forkexec-constructor/container is going the way of the Dodo though,
> >>> that's a good enough incentive :-).
> >
> > That turns out to be bit problematic; dbus-daemon must not run in its
> > own user namespace (CLONE_NEWUSER) as it wants to validate user/group
> > IDs.  That's probably the reason it was working with
> > 'make-forkexec-constructor/container', as this was dropping the user and
> > net namespaces, contrary to least-authority, which uses them all.
> >
> > The problem then seems to be that since we need CAP_SYS_ADMIN when
> > dropping the user namespace, as CLONE_NEWUSER is what gives us
> > superpowers.  Per 'man user_namespaces':
> >
> >   The child process created by clone(2) with the CLONE_NEWUSER flag starts
> >   out with a complete set of capabilities in the new user namespace.
> >
> > Which means that if we combine something like (untested):
> >
> > (make-forkexec-constructor
> >   (least-authority
> > (list (file-append coreutils "/bin/true"))
> > (mappings (delq 'user %namespaces))
> >   #:user  "nobody"
> >   #:group "nobody"))
> >
> > the make-forkexec-constructor will switch to the non-privileged user
> > before the clone call is made, and it will fail with EPERM.
> >
> > When using 'make-forkexec-constructor/container', the clone(2) call
> > happens before switching user, thus as 'root' in Shepherd, which
> > explains why it works.
> 
> Damnit, that’s right.  For example the result of:
> 
>(lower-object (least-authority-wrapper (file-append coreutils "/bin/uname")
>   #:namespaces (delq 'user 
> %namespaces)))
> 
> won’t run as an unprivileged user:
> 
> --8<---cut here---start->8---
> $ $(guix build /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv)
> substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
> substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 
> 100.0%
> substitute: updating substitutes from 'https://guix.bordeaux.inria.fr'... 
> 100.0%
> The following derivations will be built:
>   /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv
>   /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv
> building /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv...
> successfully built /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv
> building /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv...
> successfully built 
> /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv
> Backtrace:
>5 (primitive-load 
> "/gnu/store/ifsh87aifh2k8pqzhkjxncq3vskpwx3l-pola-wrapper")
> In ice-9/eval.scm:
>191:35  4 (_ #f)
> In gnu/build/linux-container.scm:
> 300:8  3 (call-with-temporary-directory # gnu/build/linux-container.scm:396:3 (root)>)
>397:16  2 (_ "/tmp/guix-directory.K9gBNH")
> 239:7  1 (run-container "/tmp/guix-directory.K9gBNH" (#< 
> device: "/gnu/store/jkjs0inmzhj4vsvclbf08nmh0shm7lrf-attr-2.5…> …) …)
> In guix/build/syscalls.scm:
>   1099:12  0 (_ 1845624849)
> 
> guix/build/syscalls.scm:1099:12: In procedure clone: 1845624849: Operation 
> not permitted
> --8<---cut here---end--->8---
> 
> > I'm not sure how it could be fixed; it seems the user changing business
> > would need to be handled by the least-authority-wrapper code?  And the
> > make-forkexec-constructor would probably need to detect that command is
> > a pola wrapper and then avoid changing the user/group itself to not
> > interfere.
> 
> I think we would add #:user and #:group to ‘least-authority-wrapper’ and
> have it call setuid/setgid.  ‘make-forkexec-constructor’ doesn’t need to
> be modified, but t

bug#54786: Installation tests are failing

2022-06-07 Thread Ludovic Courtès
Hi!

Maxim Cournoyer  skribis:

> Ludovic Courtès  writes:

[...]

>>> I reviewed how that works, and it'd be easy; I just didn't see the
>>> incentive yet (there's no composition needed for the service, and it'd
>>> make the definition slightly less readable).  If you tell me
>>> mark+forkexec-constructor/container is going the way of the Dodo though,
>>> that's a good enough incentive :-).
>
> That turns out to be bit problematic; dbus-daemon must not run in its
> own user namespace (CLONE_NEWUSER) as it wants to validate user/group
> IDs.  That's probably the reason it was working with
> 'make-forkexec-constructor/container', as this was dropping the user and
> net namespaces, contrary to least-authority, which uses them all.
>
> The problem then seems to be that since we need CAP_SYS_ADMIN when
> dropping the user namespace, as CLONE_NEWUSER is what gives us
> superpowers.  Per 'man user_namespaces':
>
>   The child process created by clone(2) with the CLONE_NEWUSER flag starts
>   out with a complete set of capabilities in the new user namespace.
>
> Which means that if we combine something like (untested):
>
> (make-forkexec-constructor
>   (least-authority
> (list (file-append coreutils "/bin/true"))
> (mappings (delq 'user %namespaces))
>   #:user  "nobody"
>   #:group "nobody"))
>
> the make-forkexec-constructor will switch to the non-privileged user
> before the clone call is made, and it will fail with EPERM.
>
> When using 'make-forkexec-constructor/container', the clone(2) call
> happens before switching user, thus as 'root' in Shepherd, which
> explains why it works.

Damnit, that’s right.  For example the result of:

   (lower-object (least-authority-wrapper (file-append coreutils "/bin/uname")
  #:namespaces (delq 'user 
%namespaces)))

won’t run as an unprivileged user:

--8<---cut here---start->8---
$ $(guix build /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv)
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: updating substitutes from 'https://guix.bordeaux.inria.fr'... 100.0%
The following derivations will be built:
  /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv
  /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv
building /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv...
successfully built /gnu/store/bd63i07rvvsw7xgsig0cbdsw7fpznd1k-references.drv
building /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv...
successfully built /gnu/store/hy8rd8p8pid67ac27dwm63svl5bqn0a1-pola-wrapper.drv
Backtrace:
   5 (primitive-load 
"/gnu/store/ifsh87aifh2k8pqzhkjxncq3vskpwx3l-pola-wrapper")
In ice-9/eval.scm:
   191:35  4 (_ #f)
In gnu/build/linux-container.scm:
300:8  3 (call-with-temporary-directory #)
   397:16  2 (_ "/tmp/guix-directory.K9gBNH")
239:7  1 (run-container "/tmp/guix-directory.K9gBNH" (#< 
device: "/gnu/store/jkjs0inmzhj4vsvclbf08nmh0shm7lrf-attr-2.5…> …) …)
In guix/build/syscalls.scm:
  1099:12  0 (_ 1845624849)

guix/build/syscalls.scm:1099:12: In procedure clone: 1845624849: Operation not 
permitted
--8<---cut here---end--->8---

> I'm not sure how it could be fixed; it seems the user changing business
> would need to be handled by the least-authority-wrapper code?  And the
> make-forkexec-constructor would probably need to detect that command is
> a pola wrapper and then avoid changing the user/group itself to not
> interfere.

I think we would add #:user and #:group to ‘least-authority-wrapper’ and
have it call setuid/setgid.  ‘make-forkexec-constructor’ doesn’t need to
be modified, but the user simply won’t pass #:user and #:group to it.

Thanks,
Ludo’.





bug#54786: Installation tests are failing

2022-06-03 Thread Maxim Cournoyer
Hi Ludovic!

Ludovic Courtès  writes:

> Howdy!
>
> Maxim Cournoyer  skribis:

[...]

>> I reviewed how that works, and it'd be easy; I just didn't see the
>> incentive yet (there's no composition needed for the service, and it'd
>> make the definition slightly less readable).  If you tell me
>> mark+forkexec-constructor/container is going the way of the Dodo though,
>> that's a good enough incentive :-).

That turns out to be bit problematic; dbus-daemon must not run in its
own user namespace (CLONE_NEWUSER) as it wants to validate user/group
IDs.  That's probably the reason it was working with
'make-forkexec-constructor/container', as this was dropping the user and
net namespaces, contrary to least-authority, which uses them all.

The problem then seems to be that since we need CAP_SYS_ADMIN when
dropping the user namespace, as CLONE_NEWUSER is what gives us
superpowers.  Per 'man user_namespaces':

  The child process created by clone(2) with the CLONE_NEWUSER flag starts
  out with a complete set of capabilities in the new user namespace.

Which means that if we combine something like (untested):

--8<---cut here---start->8---
(make-forkexec-constructor
  (least-authority
(list (file-append coreutils "/bin/true"))
(mappings (delq 'user %namespaces))
  #:user  "nobody"
  #:group "nobody"))
--8<---cut here---end--->8---

the make-forkexec-constructor will switch to the non-privileged user
before the clone call is made, and it will fail with EPERM.

When using 'make-forkexec-constructor/container', the clone(2) call
happens before switching user, thus as 'root' in Shepherd, which
explains why it works.

I'm not sure how it could be fixed; it seems the user changing business
would need to be handled by the least-authority-wrapper code?  And the
make-forkexec-constructor would probably need to detect that command is
a pola wrapper and then avoid changing the user/group itself to not
interfere.

To be continued!

Maxim





bug#54786: Installation tests are failing

2022-06-02 Thread Ludovic Courtès
Howdy!

Maxim Cournoyer  skribis:

> I tried capturing the issue in the commit message, but I'll provide
> another more hands-on view: the Jami service was broken due to changes
> in Shepherd 0.9.0 that caused the blocking sleeps + concurrent
> make+forkexec-constructor/container and fork+exec-command combination
> used to not work anymore.

OK.  Thanks for sharing the strace log; at first sight I don’t see any
clear clue, but hey, maybe it’s fine to leave that as a mystery since
there’s another solution.

[...]

>> Longer-term I think this should go in Jami proper though.  It’s great
>> that Guix has an edge over the competition :-), but having to maintain
>> it is less nice.
>
> Perhaps with the Scheme bindings introduced by Olivier for the Jami
> tests (that work via an embedded libguile), it could be possible to add
> the ability to pass an init script to 'jamid' at launch time, which
> would automate importing the account.  Proper 'Scheme' bindings would be
> nice though, and I'd like to look into the feasibility to add these via
> Swig.  Food for thought.

Sounds fun.  (BTW, I’d recommend against SWIG: it’s not “pretty”, leaves
a lot of work to do, including wrapping the generated wrappers and
debugging memory management issue.  Using the FFI provides more
flexibility and is much more fun IMO.)

>> Also, in more concrete terms: one goal of the least-authority work at
>>  is to remove
>> ‘make-forkexec-constructor/container’ and the whole (gnu build shepherd)
>> module.  Jami is one of its last remaining users (adjusting it felt like
>> beyond my abilities, precisely because it’s much more complex than the
>> other services I adjusted).
>>
>> Could you take a look at that eventually, once this patch has been
>> reviewed?
>
> I reviewed how that works, and it'd be easy; I just didn't see the
> incentive yet (there's no composition needed for the service, and it'd
> make the definition slightly less readable).  If you tell me
> mark+forkexec-constructor/container is going the way of the Dodo though,
> that's a good enough incentive :-).

Awesome.  :-)

Thanks for explaining!

Ludo’.





bug#54786: Installation tests are failing

2022-06-02 Thread Maxim Cournoyer
Hi Ludovic,

Ludovic Courtès  writes:

> Hi Maxim,
>
> Maxim Cournoyer  skribis:
>
>> Ludovic Courtès  writes:
>
> [...]
>
>>> Before going further, I’d like to understand: this does more than just
>>> fix the Jami system tests, right?
>>>
>>> It would have been nice to have surgical changes to “just” fix the
>>> tests, along the lines of ,
>>> possibly followed by a rework of the whole machinery, if that’s
>>> possible.
>>
>> It's not really possible unfortunately, because the rework from talking
>> to the D-Bus API via the 'dbus-send' binary to using Guile AC/D-bus was
>> needed or at least simplified fixing the issues.
>
> So am I right that the “issues” were not specifically related to the
> Shepherd 0.9.0 switch, or at least not just to that?  (Just to make sure
> I understand the context.)

I tried capturing the issue in the commit message, but I'll provide
another more hands-on view: the Jami service was broken due to changes
in Shepherd 0.9.0 that caused the blocking sleeps + concurrent
make+forkexec-constructor/container and fork+exec-command combination
used to not work anymore.

This problem can be manually observed by spawning a VM with the Jami
service:

$(guix system vm --no-graphic -e '(@@ (gnu tests telephony) %jami-os)') -m 512

Then you'll see the service doesn't even start:

--8<---cut here---start->8---
root@jami ~# herd status
[...]
Stopped:
 - jami
[...]

root@jami ~# pgrep jamid
192

root@jami ~# killall jamid

root@jami ~# herd start jami
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

12:53:47.144 os_core_unix.c !pjlib 2.11 for POSIX initialized

herd: exception caught while executing 'start' on service 'jami':
Throw to key `match-error' with args `("match" "no matching pattern" #f)'.
--8<---cut here---end--->8---

I've ran this: herd start jami& strace -p1 -f -s800 -o strace.out

Attached is the last 10% of the gzip'd file.  I couldn't explain this
failure very clearly, but when I tried investigating it was failing on
the '(dbus-service-available? "cx.ring.Ring")' call, if I recall
correctly.



strace.out.gz
Description: shepherd pid1 strace

[...]

>>> What do you think is missing upstream so that starting Jami is
>>> simpler?
>>
>> 1) Lack of D-Bus support in Shepherd to easily start D-Bus services.
>> The upstream systemd service definition for the Jami daemon (jamid) is
>> this:
>>
>> # net.jami.daemon.service
>> [D-BUS Service]
>> Name=cx.ring.Ring
>> Exec=@LIBDIR@/jamid
>>
>> But that's nearly not where the complexity of our jami-service-type
>> lies.
>
> But that should be fine: we have dozens of D-Bus services that happily
> get started by dbus-daemon.

I guess that works (minus races like we've had with elogind) if the
other services are also D-Bus services sharing the same bus.  But here
the code talking with Jami are in the Shepherd service actions and more
critically in the start slot itself -- so it's important the D-Bus
service has been acquired and ready to service D-Bus calls otherwise
they'd fail (that's what the loop polling for (jami-service-available?)
ensures).

>> Rather, it's in the following:
>>
>> 2) The lack of a way to declaratively configure Jami and the need to use
>> D-Bus API to issue commands to Jami non-interactively.  For example, to
>> have Jami import an account it's necessary to go via either
>>
>> a) the GUI or
>> b) the D-Bus API
>>
>> The Jami service in Guix makes use of b), which introduces the need for
>> some Scheme bindings wrapping the low-level D-Bus interface.  Perhaps
>> such bindings could live in Jami itself.
>>
>> The second point (2) could be addressed upstream, but since it's a
>> rather niche use case (the common use case is simply running the client
>> GUI), is already achievable via D-Bus, and would probably require a
>> considerable amount of work in Jami itself, I think we can keep it as is
>> for now, as a Guix System exclusive feature ;-).  Note that even if Jami
>> could be configured via configuration files, we'd still want to be able
>> to communicate with it via D-Bus to maintain the possible actions
>> currently available in our Shepherd service (listing/enabling/disable
>> accounts, etc.), so it'd only really help to reduce the start slot, and
>> that's it.  We'd still need most of the D-Bus bindings, so it wouldn't
>> help that much anyway.
>
> Ah I see.
>
>> I hope that clarifies how our jami-service-type is both complex but also
>> unique.
>
> Sure, the ability to configure Jami in a declarative and stateless
> fashion is a plus, that’s really cool.
>
> Longer-term I think this should go in Jami proper though.  It’s great
> that Guix has an edge over the competition :-), but having to maintain
> it is less nice.

Perhaps with the Scheme bindings introduced by Olivier for the Jami
tests (that work via an embedded libguile), i

bug#54786: Installation tests are failing

2022-06-02 Thread Ludovic Courtès
Hi Maxim,

Maxim Cournoyer  skribis:

> Ludovic Courtès  writes:

[...]

>> Before going further, I’d like to understand: this does more than just
>> fix the Jami system tests, right?
>>
>> It would have been nice to have surgical changes to “just” fix the
>> tests, along the lines of ,
>> possibly followed by a rework of the whole machinery, if that’s
>> possible.
>
> It's not really possible unfortunately, because the rework from talking
> to the D-Bus API via the 'dbus-send' binary to using Guile AC/D-bus was
> needed or at least simplified fixing the issues.

So am I right that the “issues” were not specifically related to the
Shepherd 0.9.0 switch, or at least not just to that?  (Just to make sure
I understand the context.)

>> Besides, I think we should talk to Jami upstream (which shouldn’t be too
>> hard :-)).  It doesn’t seem reasonable to me to have 800+ lines of code
>> in the distro to start one service.  Usually the ‘start’ and ‘stop’
>> methods are between 2 and 10 lines of code.
>>
>> What do you think is missing upstream so that starting Jami is
>> simpler?
>
> 1) Lack of D-Bus support in Shepherd to easily start D-Bus services.
> The upstream systemd service definition for the Jami daemon (jamid) is
> this:
>
> # net.jami.daemon.service
> [D-BUS Service]
> Name=cx.ring.Ring
> Exec=@LIBDIR@/jamid
>
> But that's nearly not where the complexity of our jami-service-type
> lies.

But that should be fine: we have dozens of D-Bus services that happily
get started by dbus-daemon.

> Rather, it's in the following:
>
> 2) The lack of a way to declaratively configure Jami and the need to use
> D-Bus API to issue commands to Jami non-interactively.  For example, to
> have Jami import an account it's necessary to go via either
>
> a) the GUI or
> b) the D-Bus API
>
> The Jami service in Guix makes use of b), which introduces the need for
> some Scheme bindings wrapping the low-level D-Bus interface.  Perhaps
> such bindings could live in Jami itself.
>
> The second point (2) could be addressed upstream, but since it's a
> rather niche use case (the common use case is simply running the client
> GUI), is already achievable via D-Bus, and would probably require a
> considerable amount of work in Jami itself, I think we can keep it as is
> for now, as a Guix System exclusive feature ;-).  Note that even if Jami
> could be configured via configuration files, we'd still want to be able
> to communicate with it via D-Bus to maintain the possible actions
> currently available in our Shepherd service (listing/enabling/disable
> accounts, etc.), so it'd only really help to reduce the start slot, and
> that's it.  We'd still need most of the D-Bus bindings, so it wouldn't
> help that much anyway.

Ah I see.

> I hope that clarifies how our jami-service-type is both complex but also
> unique.

Sure, the ability to configure Jami in a declarative and stateless
fashion is a plus, that’s really cool.

Longer-term I think this should go in Jami proper though.  It’s great
that Guix has an edge over the competition :-), but having to maintain
it is less nice.

Also, in more concrete terms: one goal of the least-authority work at
 is to remove
‘make-forkexec-constructor/container’ and the whole (gnu build shepherd)
module.  Jami is one of its last remaining users (adjusting it felt like
beyond my abilities, precisely because it’s much more complex than the
other services I adjusted).

Could you take a look at that eventually, once this patch has been
reviewed?

Thanks,
Ludo’.





bug#54786: Installation tests are failing

2022-06-01 Thread Maxim Cournoyer
Hi Ludovic,

Ludovic Courtès  writes:

> Hi Maxim,
>
> Maxim Cournoyer  skribis:
>
>>  gnu/build/dbus-service.scm | 212 
>>  gnu/build/jami-service.scm | 390 +
>>  gnu/local.mk   |   1 +
>>  gnu/packages/glib.scm  |  19 +-
>>  gnu/services/telephony.scm | 500 +
>>  gnu/tests/telephony.scm| 412 +++---
>>  6 files changed, 726 insertions(+), 808 deletions(-)
>>  create mode 100644 gnu/build/dbus-service.scm
>
> Before going further, I’d like to understand: this does more than just
> fix the Jami system tests, right?
>
> It would have been nice to have surgical changes to “just” fix the
> tests, along the lines of ,
> possibly followed by a rework of the whole machinery, if that’s
> possible.

It's not really possible unfortunately, because the rework from talking
to the D-Bus API via the 'dbus-send' binary to using Guile AC/D-bus was
needed or at least simplified fixing the issues.  Going back trying to
make it work the way it was would be new work that'd end up being
scrapped anyway with a subsequent commit making use of the Guile D-Bus
library, so I'm not interested in pursuing it.

> Besides, I think we should talk to Jami upstream (which shouldn’t be too
> hard :-)).  It doesn’t seem reasonable to me to have 800+ lines of code
> in the distro to start one service.  Usually the ‘start’ and ‘stop’
> methods are between 2 and 10 lines of code.
>
> What do you think is missing upstream so that starting Jami is
> simpler?

1) Lack of D-Bus support in Shepherd to easily start D-Bus services.
The upstream systemd service definition for the Jami daemon (jamid) is
this:

--8<---cut here---start->8---
# net.jami.daemon.service
[D-BUS Service]
Name=cx.ring.Ring
Exec=@LIBDIR@/jamid
--8<---cut here---end--->8---

But that's nearly not where the complexity of our jami-service-type
lies.  Rather, it's in the following:

2) The lack of a way to declaratively configure Jami and the need to use
D-Bus API to issue commands to Jami non-interactively.  For example, to
have Jami import an account it's necessary to go via either

a) the GUI or
b) the D-Bus API

The Jami service in Guix makes use of b), which introduces the need for
some Scheme bindings wrapping the low-level D-Bus interface.  Perhaps
such bindings could live in Jami itself.

The second point (2) could be addressed upstream, but since it's a
rather niche use case (the common use case is simply running the client
GUI), is already achievable via D-Bus, and would probably require a
considerable amount of work in Jami itself, I think we can keep it as is
for now, as a Guix System exclusive feature ;-).  Note that even if Jami
could be configured via configuration files, we'd still want to be able
to communicate with it via D-Bus to maintain the possible actions
currently available in our Shepherd service (listing/enabling/disable
accounts, etc.), so it'd only really help to reduce the start slot, and
that's it.  We'd still need most of the D-Bus bindings, so it wouldn't
help that much anyway.

I hope that clarifies how our jami-service-type is both complex but also
unique.

Happy video-conferencing!

Maxim





bug#54786: Installation tests are failing

2022-06-01 Thread Ludovic Courtès
Hi Maxim,

Maxim Cournoyer  skribis:

>  gnu/build/dbus-service.scm | 212 
>  gnu/build/jami-service.scm | 390 +
>  gnu/local.mk   |   1 +
>  gnu/packages/glib.scm  |  19 +-
>  gnu/services/telephony.scm | 500 +
>  gnu/tests/telephony.scm| 412 +++---
>  6 files changed, 726 insertions(+), 808 deletions(-)
>  create mode 100644 gnu/build/dbus-service.scm

Before going further, I’d like to understand: this does more than just
fix the Jami system tests, right?

It would have been nice to have surgical changes to “just” fix the
tests, along the lines of ,
possibly followed by a rework of the whole machinery, if that’s
possible.

Besides, I think we should talk to Jami upstream (which shouldn’t be too
hard :-)).  It doesn’t seem reasonable to me to have 800+ lines of code
in the distro to start one service.  Usually the ‘start’ and ‘stop’
methods are between 2 and 10 lines of code.

What do you think is missing upstream so that starting Jami is simpler?

Thanks,
Ludo’.





bug#54786: Installation tests are failing

2022-05-28 Thread Ludovic Courtès
Hi Maxim,

Maxim Cournoyer  skribis:

> Ludovic Courtès  writes:
>
>> Hi,
>>
>> Mathieu Othacehe  skribis:
>>
>>> Thanks for the fix! The jami and jami-provisioning tests are also broken
>>> because of what looks like to be the same issue:
>>>
>>> One does not simply initialize the client: Another daemon is detected
>>> /gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
>>> ERROR:
>>>   1. &action-exception-error:
>>>   service: jami
>>>   action: start
>>>   key: match-error
>>>   args: ("match" "no matching pattern" #f)
>>> Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
>>> https://jami.net/
>>> [Video support enabled]
>>> [Plugins support enabled]
>>
>> Yes, I noticed that, but I’m not sure how to apply a similar workaround.
>
> I tried fixing that today, but so far I've only managed to understand
> what seems to be going wrong, with this (not so great) workflow:

While working on , I figured
‘wait-for-service’ could be useful for system tests that were previously
using ‘start-service’ as a way to wait for a service to be up and
running.

I tried the following change, which should be semantically equivalent to
what was happening with the Shepherd 0.8.  However, it doesn’t seem to
work, for reasons that escape me.

Thoughts?

Ludo’.

diff --git a/gnu/tests/telephony.scm b/gnu/tests/telephony.scm
index bc464a431a..c219868859 100644
--- a/gnu/tests/telephony.scm
+++ b/gnu/tests/telephony.scm
@@ -145,11 +145,7 @@ (define marionette
 (marionette-eval
  '(begin
 (use-modules (gnu services herd))
-(match (start-service 'jami)
-  (#f #f)
-  (('service response-parts ...)
-   (match (assq-ref response-parts 'running)
- ((pid) (number? pid))
+(wait-for-service 'jami #:timeout 60))
  marionette))
 
   (test-assert "service can be stopped"
@@ -158,12 +154,7 @@ (define marionette
 (use-modules (gnu services herd)
  (rnrs base))
 (setenv "PATH" "/run/current-system/profile/bin")
-(let ((pid (match (start-service 'jami)
- (#f #f)
- (('service response-parts ...)
-  (match (assq-ref response-parts 'running)
-((pid) pid))
-
+(let ((pid (wait-for-service 'jami)))
   (assert (number? pid))
 
   (match (stop-service 'jami)
@@ -193,14 +184,10 @@ (define pid (match (start-service 'jami)
 ;; Restart the service.
 (restart-service 'jami)
 
-(define new-pid (match (start-service 'jami)
-  (#f #f)
-  (('service response-parts ...)
-   (match (assq-ref response-parts 'running)
- ((pid) pid)
+(define new-pid (wait-for-service 'jami))
 (assert (number? new-pid))
 
-(not (eq? pid new-pid)))
+(not (= pid new-pid)))
  marionette))
 
   (unless #$provisioning? (test-skip 1))


bug#54786: Installation tests are failing

2022-05-24 Thread Maxim Cournoyer
Hi,

Ludovic Courtès  writes:

> Hi,
>
> Mathieu Othacehe  skribis:
>
>> Thanks for the fix! The jami and jami-provisioning tests are also broken
>> because of what looks like to be the same issue:
>>
>> One does not simply initialize the client: Another daemon is detected
>> /gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
>> ERROR:
>>   1. &action-exception-error:
>>   service: jami
>>   action: start
>>   key: match-error
>>   args: ("match" "no matching pattern" #f)
>> Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
>> https://jami.net/
>> [Video support enabled]
>> [Plugins support enabled]
>
> Yes, I noticed that, but I’m not sure how to apply a similar workaround.

I tried fixing that today, but so far I've only managed to understand
what seems to be going wrong, with this (not so great) workflow:

1. Add pk uses in the code.

2. $(./pre-inst-env guix system vm --no-graphic -e '(@@ (gnu tests
telephony) %jami-os)' --no-offload --no-substitutes) -m 512 -nic
user,model=virtio-net-pci,hostfwd=tcp::10022-:22

3. ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -p
10022 root@localhost

and poke around with 'herd status', read /var/log/messages, experiment
with dbus-send, etc.

This allowed me to find out that (dbus-available-services) appears to be
broken.  I'm not sure why the exceptions are reported so badly by
Shepherd (are exceptions raised with 'error' not handled by Shepherd or
something? -- the with-retries loop should end up printing the caught
exception arguments -- I would also have expected to see the backtrace
somewhere.

Anyway, connecting to another machine that is running the
jami-service-type still (hasn't been reconfigured in a while), I could
see:

--8<---cut here---start->8---
scheme@(guix-user)> ,use (gnu build jami-service)
scheme@(guix-user)> (dbus-available-services)
;;; Failed to autoload fork+exec-command in (shepherd service):
;;; no code for module (fibers)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
error: fork+exec-command: unbound variable
--8<---cut here---end--->8---

Oh yes, so it now requires guile-fibers.  After installing it:

--8<---cut here---start->8---
scheme@(guix-user)> ,use (gnu build jami-service)
scheme@(guix-user)> (dbus-available-services)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
No scheduler current; call within run-fibers instead
--8<---cut here---end--->8---

So the users of fork+exec-command (a public API) needs to be adjusted.
I suspect that's the crux of the issue here.  The rest (the jami tests
using Shepherd's start-service to check the service status and causing
multiple starts) should be easy to workaround.

To be continued...

Maxim





bug#54786: Installation tests are failing

2022-05-01 Thread Ludovic Courtès
Hi,

Mathieu Othacehe  skribis:

> Thanks for the fix! The jami and jami-provisioning tests are also broken
> because of what looks like to be the same issue:
>
> One does not simply initialize the client: Another daemon is detected
> /gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
> ERROR:
>   1. &action-exception-error:
>   service: jami
>   action: start
>   key: match-error
>   args: ("match" "no matching pattern" #f)
> Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
> https://jami.net/
> [Video support enabled]
> [Plugins support enabled]

Yes, I noticed that, but I’m not sure how to apply a similar workaround.

> I think we don't have the right approach here: we should check that the
> system tests are passing before pushing series and not adapt the tests
> afterwards.

Yes, apologies for that.

> Historically this was difficult because the system tests were often in a
> semi-broken state. Before the Shepherd update the tests were however all
> passing (modulo rare intermittent failures).
>
> As it's not always obvious what's going to break the system tests and
> what's not (simple package update can easily break them), it would be
> really nice to have mandatory commit verification.
>
> The mumi/cuirass gateway that has already been discussed could really
> help us here. If some people are motivated, we could split the work and
> introduce such a mechanism.

Yes, I agree; an “always green” ‘master’ branch would be great.

Do you have milestones in mind for “commit verification”?

As I see it, the difficulty is that we’ve been looking at a horizon of
features à la GitLab-CI without being quite sure how to get there (apart
from deploying GitLab or a similar tool, that is).

A first step that comes to mind would be an easier way to set up
transient jobsets for a branch (or, ideally, for an issue: the thing
would apply patches and create the branch).

Thoughts?

(Maybe worth moving to guix-devel.)

Ludo’.





bug#54786: Installation tests are failing

2022-04-30 Thread Mathieu Othacehe


Hey Ludo,

> I pushed something along these lines as
> 73afbb0765f76834b53c9fe6cf3c8f740840.

Thanks for the fix! The jami and jami-provisioning tests are also broken
because of what looks like to be the same issue:

--8<---cut here---start->8---
One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
ERROR:
  1. &action-exception-error:
  service: jami
  action: start
  key: match-error
  args: ("match" "no matching pattern" #f)
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]
--8<---cut here---end--->8---

I think we don't have the right approach here: we should check that the
system tests are passing before pushing series and not adapt the tests
afterwards.

Historically this was difficult because the system tests were often in a
semi-broken state. Before the Shepherd update the tests were however all
passing (modulo rare intermittent failures).

As it's not always obvious what's going to break the system tests and
what's not (simple package update can easily break them), it would be
really nice to have mandatory commit verification.

The mumi/cuirass gateway that has already been discussed could really
help us here. If some people are motivated, we could split the work and
introduce such a mechanism.

Thanks,

Mathieu






bug#54786: Installation tests are failing

2022-04-29 Thread Ludovic Courtès
Ludovic Courtès  skribis:

> In the interim, perhaps we can work around that by using a different
> check to determine whether the service is running.  For instance,
> instead of:
>
>   (test-assert "nginx running"
> (marionette-eval
>  '(begin
> (use-modules (gnu services herd))
> (start-service 'nginx))
>  marionette))
>
> … we’d write something like:
>
>   (test-assert "nginx running"
> (wait-for-file "/var/run/nginx/pid"))

I pushed something along these lines as
73afbb0765f76834b53c9fe6cf3c8f740840.

I wasn’t able to fix the tailon test because the ‘tailon’ package
doesn’t build and I failed to address that in a timely fashion.

Ludo’.





bug#54786: Installation tests are failing

2022-04-28 Thread Ludovic Courtès
Hi!

Mathieu Othacehe  skribis:

>> * cgit-test (https://ci.guix.gnu.org/build/646812/details)
>
> The nginx daemon seems to be started multiple times:

I believe this is caused by a change of semantics (really: a bug) in the
shepherd ‘start’ method in 0.9.0.

Previously, ‘start’ would wait until the daemon was started.  If the
service was being started, shepherd wouldn’t reply until it was done
starting it.

In 0.9.0, shepherd replies even while it’s waiting for the service to be
started.  But as a consequence, it lets you start a service that is
already being started, leading to this mess you reported.


The proper fix is to better track the status of each service in
shepherd, and to prevent double-starts.

In the interim, perhaps we can work around that by using a different
check to determine whether the service is running.  For instance,
instead of:

  (test-assert "nginx running"
(marionette-eval
 '(begin
(use-modules (gnu services herd))
(start-service 'nginx))
 marionette))

… we’d write something like:

  (test-assert "nginx running"
(wait-for-file "/var/run/nginx/pid"))

Thoughts?  I’ll give that a try.

Thanks for the heads-up!

Ludo’.





bug#54786: Installation tests are failing

2022-04-28 Thread Mathieu Othacehe


Hello,

Those tests are still failing. It looks like most of the failures are
caused by daemons started multiple times.

> * cgit-test (https://ci.guix.gnu.org/build/646812/details)

The nginx daemon seems to be started multiple times:

--8<---cut here---start->8---
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)



This is the GNU system.  Welcome.
komputilo login: nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address 
already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] still could not bind()
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
ERROR:
  1. &action-exception-error:
  service: nginx
  action: start
  key: %exception
  args: ("#<&invoke-error program: 
\"/gnu/store/815abphg8vr8qkl8ykd8pyxp1v62c9gk-nginx-1.21.6/sbin/nginx\" 
arguments: (\"-c\" \"/gnu/store/rbjgg41p22lgkjwrc8inrhbmqah54cgq-nginx.conf\" 
\"-p\" \"/var/run/nginx\") exit-status: 1 term-signal: #f stop-signal: #f>")

Tests failed, dumping log file 
'/gnu/store/p72g83l9nag6c830pzwgcgpnvnyr53p1-cgit-test/cgit.log'.
--8<---cut here---end--->8---

> * gitile-test (https://ci.guix.gnu.org/build/646813/details)

The nginx daemon seems to be started multiple times:

--8<---cut here---start->8---
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)



This is the GNU system.  Welcome.
komputilo login: nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address 
already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:19418 failed (98: Address already in use)
nginx: [emerg] still could not bind()
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
ERROR:
  1. &action-exception-error:
  service: nginx
  action: start
  key: %exception
  args: ("#<&invoke-error program: 
\"/gnu/store/815abphg8vr8qkl8ykd8pyxp1v62c9gk-nginx-1.21.6/sbin/nginx\" 
arguments: (\"-c\" \"/gnu/store/ayafihmfwg3yw4hp8nw622g2rr9mw7vn-nginx.conf\" 
\"-p\" \"/var/run/nginx\") exit-status: 1 term-signal: #f stop-signal: #f>")

Tests failed, dumping log file 
'/gnu/store/ix0hpwpr7b6zh20arig9bpg2lqzysxi7-gitile-test/gitile.log'.
--8<---cut here---end--->8---

> * jami-provisioning-test (https://ci.guix.gnu.org/build/646810/details)
> * jami-test (https://ci.guix.gnu.org/build/646811/details)

Looks like those tests are failing because the daemon is started
multiple times:

--8<---cut here---start->8---
This is the GNU system.  Welcome.
jami login: Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

12:21:08.165 os_core_unix.c !pjlib 2.11 for POSIX initialized
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
ERROR:
  1. &action-exception-error:
  service: jami
  action: start
  key: match-error
  args: ("match" "no matching pattern" #f)
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
ERROR:
  1. &action-exception-error:
  service: jami
  action: start
  key: match-error
  args: ("match" "no matching pattern" #f)
Jami Daemon 11.0.0, by Savoir-faire Linux 2004-2019
https://jami.net/
[Video support enabled]
[Plugins support enabled]

One does not simply initialize the client: Another daemon is detected
/gnu/store/01phrvxnxrg1q0gxa35g7f77q06crf6v-shepherd-marionette.scm:1:1718: 
ERROR:
  1. &action-exception-error:
  service: jami
  action: start
  key: match-error
  args: ("match" "no matching pattern" #f)
--8<---cut here---end--->8---

Thanks,

Mathieu





bug#54786: Installation tests are failing

2022-04-08 Thread Mathieu Othacehe


The following tests are also failing since the Shepherd upgrade:

* cgit-test (https://ci.guix.gnu.org/build/646812/details)
* tailon-test (https://ci.guix.gnu.org/build/646822/details)
* gitile-test (https://ci.guix.gnu.org/build/646813/details)
* jami-provisioning-test (https://ci.guix.gnu.org/build/646810/details)
* jami-test (https://ci.guix.gnu.org/build/646811/details)

Thanks,

Mathieu





bug#54786: Installation tests are failing

2022-04-08 Thread Mathieu Othacehe


Hello,

The installation tests are failing this way:

--8<---cut here---start->8---
conversation expecting pattern ((quote pause))
Apr  7 17:41:58 localhost installer[227]: guix system: error: failed to connect 
to `/var/guix/daemon-socket/socket': Connection refused 
--8<---cut here---end--->8---

this is right after the 'guix-daemon' service is restarted. It looks
like this regression is introduced by the switch to the new Shepherd
release.

See:
https://ci.guix.gnu.org/build/646754/details
https://ci.guix.gnu.org/build/646759/details
https://ci.guix.gnu.org/build/646766/details
https://ci.guix.gnu.org/build/646773/details

Thanks,

Mathieu