Branch: refs/heads/master
  Home:   https://github.com/NixOS/nixpkgs
  Commit: 67223ee205364afb203361b134f16b890c4d726c
      
https://github.com/NixOS/nixpkgs/commit/67223ee205364afb203361b134f16b890c4d726c
  Author: aszlig <[email protected]>
  Date:   2016-05-06 (Fri, 06 May 2016)

  Changed paths:
    M nixos/modules/system/boot/stage-1-init.sh

  Log Message:
  -----------
  nixos/stage-1: Don't kill kernel threads

Unfortunately, pkill doesn't distinguish between kernel and user space
processes, so we need to make sure we don't accidentally kill kernel
threads.

Normally, a kernel thread ignores all signals, but there are a few that
do. A quick grep on the kernel source tree (as of kernel 4.6.0) shows
the following source files which use allow_signal():

  drivers/isdn/mISDN/l1oip_core.c
  drivers/md/md.c
  drivers/misc/mic/cosm/cosm_scif_server.c
  drivers/misc/mic/cosm_client/cosm_scif_client.c
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
  drivers/staging/rtl8188eu/core/rtw_cmd.c
  drivers/staging/rtl8712/rtl8712_cmd.c
  drivers/target/iscsi/iscsi_target.c
  drivers/target/iscsi/iscsi_target_login.c
  drivers/target/iscsi/iscsi_target_nego.c
  drivers/usb/atm/usbatm.c
  drivers/usb/gadget/function/f_mass_storage.c
  fs/jffs2/background.c
  fs/lockd/clntlock.c
  fs/lockd/svc.c
  fs/nfs/nfs4state.c
  fs/nfsd/nfssvc.c

While not all of these are necessarily kthreads and some functionality
may still be unimpeded, it's still quite harmful and can cause
unexpected side-effects, especially because some of these kthreads are
storage-related (which we obviously don't want to kill during bootup).

During discussion at #15226, @dezgeg suggested the following
implementation:

for pid in $(pgrep -v -f '@'); do
    if [ "$(cat /proc/$pid/cmdline)" != "" ]; then
  kill -9 "$pid"
    fi
done

This has a few downsides:

 * User space processes which use an empty string in their command line
   won't be killed.
 * It results in errors during bootup because some shell-related
   processes are already terminated (maybe it's pgrep itself, haven't
   checked).
 * The @ is searched within the full command line, not just at the
   beginning of the string. Of course, we already had this until now, so
   it's not a problem of his implementation.

I posted an alternative implementation which doesn't suffer from the
first point, but even that one wasn't sufficient:

for pid in $(pgrep -v -f '^@'); do
    readlink "/proc/$pid/exe" &> /dev/null || continue
    echo "$pid"
done | xargs kill -9

This one spawns a subshell, which would be included in the processes to
kill and actually kills itself during the process.

So what we have now is even checking whether the shell process itself is
in the list to kill and avoids killing it just to be sure.

Also, we don't spawn a subshell anymore and use /proc/$pid/exe to
distinguish between user space and kernel processes like in the comments
of the following StackOverflow answer:

http://stackoverflow.com/a/12231039

We don't need to take care of terminating processes, because what we
actually want IS to terminate the processes.

The only point where this (and any previous) approach falls short if we
have processes that act like fork bombs, because they might spawn
additional processes between the pgrep and the killing. We can only
address this with process/control groups and this still won't save us
because the root user can escape from that as well.

Signed-off-by: aszlig <[email protected]>
Fixes: #15226


  Commit: dc6d0030110d0ebd7f13a0fdaccfe494fbad0e29
      
https://github.com/NixOS/nixpkgs/commit/dc6d0030110d0ebd7f13a0fdaccfe494fbad0e29
  Author: aszlig <[email protected]>
  Date:   2016-05-06 (Fri, 06 May 2016)

  Changed paths:
    M nixos/tests/installer.nix

  Log Message:
  -----------
  nixos/tests/installer/swraid: Check for safemode

This is a regression test for #15226, so that the test will fail once we
accidentally kill one or more of the md kthreads (aka: if safe mode is
enabled).

Signed-off-by: aszlig <[email protected]>


  Commit: 4f796c28d57887cc9812190bc99fb45b2acd6d1c
      
https://github.com/NixOS/nixpkgs/commit/4f796c28d57887cc9812190bc99fb45b2acd6d1c
  Author: aszlig <[email protected]>
  Date:   2016-05-06 (Fri, 06 May 2016)

  Changed paths:
    M nixos/release.nix
    A nixos/tests/boot-stage1.nix

  Log Message:
  -----------
  nixos/tests: Add a test for boot stage 1

We already have a small regression test for #15226 within the swraid
installer test. Unfortunately, we only check there whether the md
kthread got signalled but not whether other rampaging processes are
still alive that *should* have been killed.

So in order to do this we provide multiple canary processes which are
checked after the system has booted up:

 * canary1: It's a simple forking daemon which just sleeps until it's
      going to be killed. Of course we expect this process to not
      be alive anymore after boot up.
 * canary2: Similar to canary1, but tries to mimick a kthread to make
      sure that it's going to be properly killed at the end of
      stage 1.
 * canary3: Like canary2, but this time using a @ in front of its
      command name to actually prevent it from being killed.
 * kcanary: This one is a real kthread and it runs until killed, which
      shouldn't be the case.

Tested with and without 67223ee and everything works as expected, at
least on my machine.

Signed-off-by: aszlig <[email protected]>


  Commit: eb6e366446ea00d92e2a39d2700dc4faa0a6bdf0
      
https://github.com/NixOS/nixpkgs/commit/eb6e366446ea00d92e2a39d2700dc4faa0a6bdf0
  Author: aszlig <[email protected]>
  Date:   2016-05-06 (Fri, 06 May 2016)

  Changed paths:
    M nixos/release-combined.nix

  Log Message:
  -----------
  nixos/release-combined: Add boot-stage1 test

We don't want to push out a channel update whenever this test fails,
because that might have unexpected and confused side effects and it
*really* means that stage 1 of our boot up is broken.

Signed-off-by: aszlig <[email protected]>


  Commit: 64ca91cac9b5dd520a736528a3f0a29ba1480593
      
https://github.com/NixOS/nixpkgs/commit/64ca91cac9b5dd520a736528a3f0a29ba1480593
  Author: aszlig <[email protected]>
  Date:   2016-05-06 (Fri, 06 May 2016)

  Changed paths:
    M nixos/tests/boot-stage1.nix

  Log Message:
  -----------
  nixos/tests/boot-stage1: Add myself to maintainers

As @edolstra pointed out that the kernel module might be painful to
maintain. I strongly disagree because it's only a small module and it's
good to have such a canary in the tests no matter how the bootup process
looks like, so I'm going the masochistic route and try to maintain it.

If it *really* becomes too much maintenance burden, we can still drop or
disable kcanary.

Signed-off-by: aszlig <[email protected]>


  Commit: e936f7dff6a42080e0af0687e6858160af16dd8c
      
https://github.com/NixOS/nixpkgs/commit/e936f7dff6a42080e0af0687e6858160af16dd8c
  Author: aszlig <[email protected]>
  Date:   2016-05-06 (Fri, 06 May 2016)

  Changed paths:
    M nixos/modules/system/boot/stage-1-init.sh
    M nixos/release-combined.nix
    M nixos/release.nix
    A nixos/tests/boot-stage1.nix
    M nixos/tests/installer.nix

  Log Message:
  -----------
  Merge branch 'stage1-dont-kill-kthreads'

Merges pull request #15275:

    This addresses #15226 and fixes killing of processes before
    switching from the initrd to the real root.

    Right now, the pkill that is issued not only kills user space
    processes but also sends a SIGKILL to kernel threads as well.
    Usually these threads ignore signals, but some of these processes do
    handle signals, like for example the md module, which happened in
    #15226.

    It also adds a small check for the swraid installer test and a
    standalone test which checks on just that problem, so in the future
    this shouldn't happen again.

This has been acked by @edolstra on IRC.


Compare: https://github.com/NixOS/nixpkgs/compare/f53850bf21bb...e936f7dff6a4
_______________________________________________
nix-commits mailing list
[email protected]
http://lists.science.uu.nl/mailman/listinfo/nix-commits

Reply via email to