Re: Which package is responsible for setting rlimits?

2021-02-02 Thread Ansgar
Hi, Simon,

Simon Richter writes:
> For systemd, resource limits should not be set by pam_limits, because
> pam_limits reads /etc/security/limits.conf, while the systemd ecosystem
> stores resource limits in the unit files.

Please read [1].

  [1]: https://lists.debian.org/debian-devel/2021/02/msg00014.html

> Teaching pam_limits to interrogate systemd would create a functional
> dependency between the PAM and systemd packages where we could only update
> them in lock step, so that would be a maintenance nightmare.

This is wrong.  Just like we don't have to update consumers and
producers of other things like /etc/resolv.conf in lock step.  Or users
and providers of libc.

>> 2. The defaults for resource limits on non-systemd systems are no longer
>>a good default and should be changed.
>
>>This is probably true for both for system services and user
>>processes, so somewhat independent from the behavior of pam_limits.
>
> My expectation for a non-systemd system is that I have to explicitly
> configure anything but "unlimited", and I will do so if necessary.

So sysvinit should set the resource limits as high as possible?  Seems
like a reasonable change to sysvinit (as it doesn't do so currently as
far as I know; the thread came from those limits too low on sysvinit
systems after all).

> tl;dr: pam_limits is for non-systemd setups, and only gets in the way of
> users configuring limits according to systemd.resource-limits(5).It should
> not do anything if pid 1 is systemd so it doesn't interfere, and be split
> off into a separate package along with its configuration file to reduce
> confusion.

This doesn't seem correct.

> The behaviour of copying rlimits from pid 1 in the absence of
> explicit configuration is hacky but good enough for the other init
> systems.

And neither this as then we wouldn't have gotten this thread at all
which is about those defaults being too low for some applications.

Ansgar



Re: Which package is responsible for setting rlimits?

2021-02-02 Thread Simon Richter
Hi Ansgar,

On Mon, Feb 01, 2021 at 09:50:40PM +0100, Ansgar wrote:

> 1. Resources limits set for a system service (e.g. sshd) might not be
>appropriate for a user session opened by the system service.

>Debian's PAM patch seems to be targeted at dealing with this by
>defaulting to restore the "original" values (taken from a process
>assumed to be unconstrained, here pid 1): sshd might have resource
>limits enforced, but the user session calls PAM which lifts the
>limits by default.

>You argue this might not be a good idea as pid-1's limits are
>somewhat arbitrary (in particular when systemd is pid-1) and it might
>be a good idea to consider using some other default.

I wonder why we must lift them from another process instead of just
providing a hardcoded default, that just sounds hacky. For non-systemd it
still works, so I'm inclined to leave it like that for now as long as we
find a more sensible solution for systemd based setups.

For systemd, resource limits should not be set by pam_limits, because
pam_limits reads /etc/security/limits.conf, while the systemd ecosystem
stores resource limits in the unit files.

>(b) Have pam_limit query some other source for default values, for
>example get the DefaultLimit*= values systemd uses by default for
>system services or having pam_limit use some default values
>(i.e., duplicating the logic that sets DefaultLimit*= in
>systemd).

Teaching pam_limits to interrogate systemd would create a functional
dependency between the PAM and systemd packages where we could only update
them in lock step, so that would be a maintenance nightmare. The systemd
package already provides a PAM module, and this is the perfect place to
apply the configured session limits.

>(d) Have pam_limit default to just inheriting resource limits, that
>is revert the Debian-specific patch.  If an admin configures
>resource limits for system services that provide login services,
>but are not appropriate for user sessions, then the admin is
>responsible for increasing those by explicitly configuring
>pam_limits to raise them.

It'd probably make sense to ship a non-empty /etc/security/limits.conf, so
we never have to use the fallback.

> 2. The defaults for resource limits on non-systemd systems are no longer
>a good default and should be changed.

>This is probably true for both for system services and user
>processes, so somewhat independent from the behavior of pam_limits.

My expectation for a non-systemd system is that I have to explicitly
configure anything but "unlimited", and I will do so if necessary. On most
machines, I won't bother as they aren't exposed to the Internet, there is
only a single user, and if I run "make -j" without specifying a number it's
either intentional or my own fault -- and anything with actual users on it
will likely need application specific limits anyway.

> 3. Init scripts cannot safely be called in arbitrary environments which
>can have arbitrary resource limits not appropriate for the service.

>To be safe, init scripts would need to explicitly set resource limits
>when invoked.  This is also just another bit of the environment that
>would need to be explicitly sanitized, but usally isn't.

In practice, the arbitrary environment is a root shell whose limits have
just been reset by pam_limits, so this is more of an issue for people who
have bad habits like leaving root shells open and not reboot testing their
setups.

Also, I occasionally explicitly restart a service with a different
environment for testing, and being able to do that from a shell by simply
setting a few values with ulimit and calling an init script is immensely
helpful there.

>But this is unrelated to what pam_limits does: even when an admin
>*explicitly* configures lower limits for user session, these limits
>*shouldn't* be applied to system services that just happen to be
>(re)started in a user session.

Admins shouldn't configure lower limits for root shells, because they share
these limits with a lot of other processes, so a runaway service can easily
lock out the root user by having the shell fail to start.

tl;dr: pam_limits is for non-systemd setups, and only gets in the way of
users configuring limits according to systemd.resource-limits(5). It should
not do anything if pid 1 is systemd so it doesn't interfere, and be split
off into a separate package along with its configuration file to reduce
confusion. The behaviour of copying rlimits from pid 1 in the absence of
explicit configuration is hacky but good enough for the other init systems.

   Simon



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Ansgar
Hi Simon,

I think there are three aspects in your mail: the behavior of
pam_limits, defaults for resource limits on legacy init systems and some
discussion of sysvinit scripts that seems unrelated:

1. Resources limits set for a system service (e.g. sshd) might not be
   appropriate for a user session opened by the system service.

   Debian's PAM patch seems to be targeted at dealing with this by
   defaulting to restore the "original" values (taken from a process
   assumed to be unconstrained, here pid 1): sshd might have resource
   limits enforced, but the user session calls PAM which lifts the
   limits by default.

   You argue this might not be a good idea as pid-1's limits are
   somewhat arbitrary (in particular when systemd is pid-1) and it might
   be a good idea to consider using some other default.

   Possibilities seem to include:

   (a) Continue as is.

   The limits applied by pam_limit by default might not be
   reasonable as they are intended for systemd's pid-1, not
   arbitrary other processes.

   (b) Have pam_limit query some other source for default values, for
   example get the DefaultLimit*= values systemd uses by default for
   system services or having pam_limit use some default values
   (i.e., duplicating the logic that sets DefaultLimit*= in
   systemd).

   (c) Have some way to query the kernel's initial resource limits and
   use that as default (but doing so would just imply (2.) below as
   this happens on sysvinit systems as far as I understand).

   (d) Have pam_limit default to just inheriting resource limits, that
   is revert the Debian-specific patch.  If an admin configures
   resource limits for system services that provide login services,
   but are not appropriate for user sessions, then the admin is
   responsible for increasing those by explicitly configuring
   pam_limits to raise them.

   As long as sshd, getty, gdm, ... have no explicit (lower)
   resource limits configured, the inherited limits would be
   reasonable by default.

   If sshd, getty, gdm, ... have similar resource limits on
   non-systemd systems, inheriting limits would also be reasonable
   to do there.

   I think (b) or (d) would be better than (a) which might still be
   better than (c).

2. The defaults for resource limits on non-systemd systems are no longer
   a good default and should be changed.

   This is probably true for both for system services and user
   processes, so somewhat independent from the behavior of pam_limits.

3. Init scripts cannot safely be called in arbitrary environments which
   can have arbitrary resource limits not appropriate for the service.

   To be safe, init scripts would need to explicitly set resource limits
   when invoked.  This is also just another bit of the environment that
   would need to be explicitly sanitized, but usally isn't.

   But this is unrelated to what pam_limits does: even when an admin
   *explicitly* configures lower limits for user session, these limits
   *shouldn't* be applied to system services that just happen to be
   (re)started in a user session.

Ansgar



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Simon McVittie
On Mon, 01 Feb 2021 at 11:16:48 -0800, Russ Allbery wrote:
> pam_limits also does some things that are unrelated to starting services,
> such as setting up limits for interactive user sessions, and I think pure
> systemd systems still rely on that?

My understanding was that pam_limits is *only* for limits on interactive
user sessions.

The only overlap with system services is that interactive user sessions
are started by system services (system service: gdm, user session: GUI;
system service: sshd, user session: my shell; that sort of thing), and
because system services on sysvinit can be (re)started from the context
of an undefined execution environment, under sysvinit there's an extra
incentive for pam_limits to do its best to undo the effects of that
undefined execution environment and get back to something well-defined.

However, now that I look at
https://sources.debian.org/src/pam/1.4.0-2/debian/patches-applied/027_pam_limits_better_init_allow_explicit_root/
more closely, I can see that sysvinit services might not really be
the intended motivation here, because the patch description talks
about crossing session boundaries (su'ing from one user to another),
which is something that *also* happens from an undefined execution
environment. Moving from sysvinit to systemd doesn't actually help at
all in that case, because the execution environment is equally undefined
under systemd (it was *started* in a predictable state, of course, but the
user is free to adjust the rlimits for child processes in su's ancestry,
and su has to cope with that).

Even if the motivation is su'ing from one user to another, I don't see
anything in that patch that wouldn't have an equal effect when moving
from "the system" into a user session at login entry points (gdm -> a
GUI session, sshd -> a shell, etc.), and it's that situation that sets
the rlimits that get used in practice for user sessions.

smcv



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Russ Allbery
Simon Richter  writes:

> Absolutely. The vast majority of users has no need for encrypted swap,
> but might reasonably assume that secret keys are not written unencrypted
> to disk, especially not in a way that is likely to leave them there for
> weeks.

That is not a reasonable assumption.  If you don't have encrypted swap,
secret keys may be written unencrypted to disk.  The only way to solve
this problem is with encrypted swap.

If you tell someone something else, you're doing them a disservice,
because you're creating an expectation that will not be met by Linux.
Just to take the most obvious point, loads of programs on your system
(such as your web browser!) deal with secret keys, and approximately none
of them are locking memory.

> Expecting users to set up encrypted swap is a fairly steep requirement
> if all they want to do is keep a few kilobytes of secret data actually
> secret.

You do realize how easy it is to set up encrypted swap provided that you
don't use hibernate, right?

> The mlock privilege is largely relevant from a denial-of-service
> standpoint, so I think we come out ahead by allowing a program we trust
> with secret keys to theoretically create memory pressure (which still
> wouldn't spill secret keys to swap).

I would not be at all certain that the only kernel attack surface you're
exposing is denial-of-service.

-- 
Russ Allbery (r...@debian.org)  



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Ansgar
Simon Richter writes:
> On Mon, Feb 01, 2021 at 12:30:30PM -0500, Sam Hartman wrote:
>
>> It sounded like you were proposing that pam detect if systemd was pid1
>> and if so, then do what it does today otherwise inherit limits by
>> default.
>
> PAM itself doesn't need to detect anything, the individual modules are
> responsible for checking whether their requirements are met, and do
> something safe if not.
>
> The way I see it, we want a pam_systemd module that is responsible for
> applying *all* settings configured in systemd units, and that is kept in
> sync with the unit file parser, and the pam_limits module to handle the
> non-systemd setups.

Systemd doesn't manage much of the "process forked from ssh that is the
user's process", so there is no place to configure such limit.

Also:

+---
| To raise the user's limits further, the available configuration
| mechanisms differ between operating systems, but typically require
| privileges. In most cases it is possible to configure higher per-user
| resource limits via PAM or by setting limits on the system service
| encapsulating the user's service manager, i.e. the user's instance of
| user@.service.
+---[ man:systemd.exec(5) ]

So changing this limit for user sessions is currently out-of-scope for
systemd and handled by pam_limits on Debian (or whatever else).

Ansgar



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Russ Allbery
Simon Richter  writes:

> The way I see it, we want a pam_systemd module that is responsible for
> applying *all* settings configured in systemd units, and that is kept in
> sync with the unit file parser, and the pam_limits module to handle the
> non-systemd setups.

My understanding is that if you're running systemd, systemd does all of
this, so there's nothing for the PAM module to do.  So I think this
proposal reduces to arguing that pam_limits should be disabled on systemd
systems.

I think there's some merit of simplicity in going that direction on
individual systemd systems (I personally like keeping all of a daemon's
configuration in one place), but there's a huge transition problem in
trying to do this at the Debian level.  A lot of people likely have limits
configured using the pam_limits mechanism and would need to move those
limits into unit files (and in some cases replace init scripts with unit
files so that they can do so).  That's not a transition that we can easily
help with, either.

pam_limits also does some things that are unrelated to starting services,
such as setting up limits for interactive user sessions, and I think pure
systemd systems still rely on that?  So I'm not sure this is as simple as
just disabling the module or having it do nothing if systemd is init.

I see five packages in Debian that ship files in /etc/security/limits.d,
which presumably would require changes in your proposed approach to add
the same settings to their relevant unit files:

corekeeper: /etc/security/limits.d/corekeeper.conf
libvma: /etc/security/limits.d/30-libvma-limits.conf
lizardfs-common: /etc/security/limits.d/10-lizardfs.conf
stenographer-common: /etc/security/limits.d/stenographer.conf
uhd-host: /etc/security/limits.d/uhd.conf

-- 
Russ Allbery (r...@debian.org)  



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Russ Allbery
Simon McVittie  writes:
> On Mon, 01 Feb 2021 at 09:54:56 -0800, Russ Allbery wrote:

>> Does this serve any useful purpose?

> Honestly, probably not, but removing security hardening (however
> dubious) is a regression, and if I remove it I'm sure there'll be a CVE
> ID on the way shortly.

I would argue that removing the capability bit on that binary is a
security improvement rather than a regression.  I think it's more likely
there is an exploit path in the program than that someone's security will
be compromised by someone pulling keys from an unencrypted swap partition
(that couldn't have been just as easily compromised in some other way).

In general, protecting against attackers with physical access to your
system is not a realistic threat model for the average user, and if you're
not the average user and need to worry about this, you need to be using
disk encryption, not inconsistently-applied memory pinning.

My recollection is that Ferguson, et al. are quite dubious about memory
pinning approaches in _Cryptography Engineering_ because (a) you will
almost certainly not manage to pin all the memory that you need to pin
because keys get everywhere in a running program, and (b) the level of
additional complexity including security complexity is not worth the
dubious gains.

>> If someone cares about this type of security, they should put swap on
>> an encrypted file system

> Sure, you know that, and I know that, but existing systems don't have
> it.

I wonder if we could say something in the release notes or elsewhere to
encourage people to move in this direction.

Linux upstream doesn't seem very enthused about supporting hibernation, so
maybe we should similarly not be enthused about supporting hibernation and
just enable encrypted swap with ephemeral keys by default, with a warning.
If someone configures FDE, we would, of course, move swap into the FDE
scheme as well (thus enabling hibernation again).  If someone wants
hibernation without FDE, they can always turn off the ephemeral
encryption.

-- 
Russ Allbery (r...@debian.org)  



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Simon Richter
Hi Russ,

On Mon, Feb 01, 2021 at 09:54:56AM -0800, Russ Allbery wrote:

[keyring managers using mlock]

> Does this serve any useful purpose?

Absolutely. The vast majority of users has no need for encrypted swap, but
might reasonably assume that secret keys are not written unencrypted to
disk, especially not in a way that is likely to leave them there for weeks.

Expecting users to set up encrypted swap is a fairly steep requirement if
all they want to do is keep a few kilobytes of secret data actually secret.

> I think adding this capability to gnome-keyring-daemon makes the whole
> system less secure, not more secure, compared to using encrypted swap,
> since managing escalated privileges in a program is far more complicated
> and failure-prone.

The mlock privilege is largely relevant from a denial-of-service
standpoint, so I think we come out ahead by allowing a program we trust
with secret keys to theoretically create memory pressure (which still
wouldn't spill secret keys to swap).

   Simon



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Simon Richter
Hi Sam,

On Mon, Feb 01, 2021 at 12:30:30PM -0500, Sam Hartman wrote:

> It sounded like you were proposing that pam detect if systemd was pid1
> and if so, then do what it does today otherwise inherit limits by
> default.

PAM itself doesn't need to detect anything, the individual modules are
responsible for checking whether their requirements are met, and do
something safe if not.

The way I see it, we want a pam_systemd module that is responsible for
applying *all* settings configured in systemd units, and that is kept in
sync with the unit file parser, and the pam_limits module to handle the
non-systemd setups.

These two modules should never be active at the same time, but since it is
possible for local configuration to load both, this should be detected and
fixed up.

To my knowledge, pam_systemd already does nothing if the init system isn't
systemd, so all we'd need is for pam_limits to do nothing if the init
system *is* systemd -- and the same essentially for all the other PAM
services that have been subsumed in systemd.

Anything else would just create strong coupling between modules in two
separate packages, which would require either lots of compatibility code if
an API changes, or versioned dependencies in the package system, both
telltale signs of a monolith.

The PAM maintainers might decide to split off the PAM modules that do
nothing in systemd setups, so we can shrink the default installation, but
probably after bullseye.

   Simon



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Simon McVittie
On Mon, 01 Feb 2021 at 09:54:56 -0800, Russ Allbery wrote:
> Simon McVittie  writes:
> > The wider context here is that gnome-keyring-daemon, GNOME's
> > implementation of the org.freedesktop.Secrets interface, is currently
> > setcap cap_ipc_lock=ep so that it can mlock(2) secrets and stop them
> > from getting swapped out.
> 
> Does this serve any useful purpose?

Honestly, probably not, but removing security hardening (however dubious)
is a regression, and if I remove it I'm sure there'll be a CVE ID on the
way shortly.

> If someone cares about this type of
> security, they should put swap on an encrypted file system

Sure, you know that, and I know that, but existing systems don't have it.

smcv



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Russ Allbery
Simon McVittie  writes:

> The reason I ask about this is that I want to make sure we are setting
> rlimits, and in particular RLIMIT_MEMLOCK, to a realistic value for
> 2021.  The wider context here is that gnome-keyring-daemon, GNOME's
> implementation of the org.freedesktop.Secrets interface, is currently
> setcap cap_ipc_lock=ep so that it can mlock(2) secrets and stop them
> from getting swapped out. This is ineffective on systems that can
> hibernate, at which point everything (even locked memory) has to be
> written to swap in any case, but it's better than nothing.

Does this serve any useful purpose?  If someone cares about this type of
security, they should put swap on an encrypted file system, at which point
these machinations don't achieve much in the way of security.  Encrypted
swap works fine with hibernation, as long as you're willing to unlock the
drive when booting back up (which is an unavoidable requirement for any
sort of persistent encryption).

If you don't care about hibernation, it's trivial to configure swap to use
an ephemeral encryption key [1], which solves this problem more thoroughly
and completely and doesn't require each application to do complex security
configuration.

I think adding this capability to gnome-keyring-daemon makes the whole
system less secure, not more secure, compared to using encrypted swap,
since managing escalated privileges in a program is far more complicated
and failure-prone.

[1] https://feeding.cloud.geek.nz/posts/encrypted-swap-partition-on/

-- 
Russ Allbery (r...@debian.org)  



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Sam Hartman
> "Simon" == Simon McVittie  writes:

Simon> On Mon, 01 Feb 2021 at 11:49:25 -0500, Sam Hartman wrote:
>> > "Simon" == Simon McVittie  writes: I'm
>> assuming that the proposal is to change this for bookworm.

Simon> I'm sorry, I don't have a concrete proposal, and I don't
Simon> understand which package is meant to be responsible for this
Simon> well enough to write one.

It sounded like you were proposing that pam detect if systemd was pid1
and if so, then do what it does today otherwise inherit limits by
default.

Assuming the PAM maintainer wants to support alternative init systems in
a context bigger than development, that sounds like a fine option.
Detecting whether pid1 is systemd might be tricky for pam; I haven't
thought about what that does for (build) dependencies, but we could
figure something out.

However, this change seems too late for bullseye given where we are in
the freeze.



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Simon McVittie
On Mon, 01 Feb 2021 at 11:49:25 -0500, Sam Hartman wrote:
> > "Simon" == Simon McVittie  writes:
> I'm assuming that the proposal is to change this for bookworm.

I'm sorry, I don't have a concrete proposal, and I don't understand which
package is meant to be responsible for this well enough to write one.

At the moment, glib2.0 is in a position where a recent change caused
regressions for some system configurations, but if I revert the change,
then I'm undoing security hardening. I don't like either of those options,
and I'm hoping there is a better way.

smcv



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Sam Hartman
> "Simon" == Simon McVittie  writes:

I'm assuming that the proposal is to change this for bookworm.
It seems like it's too late in the process to change something like this
for bullseye without more explicit and significant harm documented  than
you have given so far.

Simon> Rationale: on sysvinit or runit systems, pid 1 is very simple
Simon> and is unlikely to need to elevate any limits, but sysadmins
Simon> are expected to restart system services in an unpredictable
Simon> execution environment (certainly true for systemd, I'm not so
Simon> sure for runit). On systemd systems, pid 1 is more complex,
Simon> but part of the value we get for that complexity is that even
Simon> when sysadmins restart system services, the service receives
Simon> a known and predictable execution environment, so it does not
Simon> need to be robust against inheriting a wrong rlimit or other
Simon> parameters.

At a project level, I mostly don't buy this rationale in the context of
the GR we passed last year.

My reading of that GR is that running alternative init systems for end
users is not a project-level goal.
It may be a goal of individual package maintainers.
Supporting development of alternative init systems is a project level
goal, or at least was at the time we voted on the GR.

So, in terms of how the project thinks about this, I think the question
should be how much  would  the behavior of accepting defaults from init
systems negatively impact the work of someone trying to develop a new
init system.

At one level, they could certainly configure PAM if the particular
situation was unusual.
At another level, if the limits that pam is likely to inherit are going
to be sufficiently broken to hender normal work, that's probably not
good.

I actually think that in most cases inheriting limits would be
acceptable for development, even if it did add some uncertainty for
production use.
I also think that a credible replacement to systemd is going to need to
provide a way to configure resource limits and to allow administrators
to restart services from pid 1 rather than from a random context.

So,I think that by the time development of an alternate init system
progresses to a point where it is being considered by the project as a
credible replacement for systemd, inheriting limits is likely to work
for that system.



It may well be that the PAM maintainer wishes to support sysvinit or
other alternate init systems in contexts broader than the development of
an init system.
I don't know; that seems like a decision for the PAM maintainer rather
than debian-devel.

If that is true, then your proposed solution seems reasonable.
If not, then perhaps we should just drop our patch.



Re: Which package is responsible for setting rlimits?

2021-02-01 Thread Simon McVittie
On Mon, 01 Feb 2021 at 13:58:57 +, Simon McVittie wrote:
> Rationale: on sysvinit or runit systems, pid 1 is very simple and is
> unlikely to need to elevate any limits, but sysadmins are expected
> to restart system services in an unpredictable execution environment
> (certainly true for systemd, I'm not so sure for runit).

Sorry, that should of course read: "certainly true for *sysvinit*,
I'm not so sure for runit".

smcv



Which package is responsible for setting rlimits?

2021-02-01 Thread Simon McVittie
A recent regression in gnome-keyring (perhaps only on systems that
use dbus-x11, it isn't completely clear to me yet) has prompted me to
look at how rlimits work in Debian. It isn't clear to me which package
is or should be responsible for choosing what arbitrary limits we use
in practice.

The kernel has some defaults, which it sets on pid 1. Some are hard-coded,
but increasingly many seem to be dependent on system state (for example
limiting memory sizes to a fixed fraction of system RAM). Traditionally,
when the init system was extremely minimal and delegated the majority
of its responsibilities to child processes (sysvinit or similar),
these defaults would be inherited by pid 1's children, and recursively
inherited by user processes.

In principle, the pam_limits.so module sets rlimits for user
processes. However, by default it is unconfigured, and in the
absence of configuration it needs to default to *something* - either
inheriting from its parent process, or resetting the limits to something
predictable. Inheriting from its parent process is problematic because the
parent process might have reset its limits internally, and in a sysvinit
world it might have been restarted by a sysadmin in an arbitrary execution
environment, leading to unpredictable limits in user processes; but
resetting the limits is also problematic, because it results in PAM
having to second-guess the limits coming from the kernel, which presumably
knows better.

Debian's PAM package currently carries a non-upstream patch to
screen-scrape the rlimits of pid 1 and use them as a guess at what the
kernel's defaults must have been. This makes perfect sense in a sysvinit
world, where sysvinit hardly does anything (the real work of booting the
system is all delegated to sysv-rc) and therefore is unlikely to need
to raise its rlimits; but it doesn't really make sense under systemd,
where pid 1 does a significant amount, and raises its rlimits accordingly.

systemd *also* has configurable default limits to be passed down to
system services (see DefaultLimitMEMLOCK, etc. in /etc/systemd/system.conf).

How is this meant to work, and is it working as intended in practice?
If I'm understanding correctly, upstream it's meant to go something
like this, with more-indented components selectively overriding
less-indented components:

kernel ->
(kernel defaults)
init ->
(systemd's configuration, if using systemd)
system service providing an entry point ->
PAM stack, pam_login.so ->
(pam_login configuration, if used)
user sessions

but because sysadmins of sysvinit systems are expected to run
"service foo restart" in an unknown execution environment,
our patched PAM changes this to:

kernel ->
(kernel defaults)
init ->
(systemd's configuration, if using systemd)
system service providing an entry point ->
PAM stack, pam_login.so ->
(PAM's best guess at what the limits *should
have been*)
(pam_login configuration, if used)
user sessions
system service providing an entry point ->
... sysadmin's arbitrary login session... ->
system service restarted by sysadmin ->
PAM stack, pam_login.so ->
(PAM's best guess at what the limits
*should have been*)
(pam_login configuration, if used)
user sessions

I wonder whether the solution ought to involve something like this:

* On non-systemd-booted systems, PAM continues to screen-scrape limits
  from pid 1 for compatibility with the "service foo restart" use-case;
* On systemd systems, PAM stops doing that, and inherits from the parent
  process by default, resulting in user processes getting the limits
  configured in pam_limits (if set), or if not set there, then the limits
  from systemd system.conf (if set), or if not set there either, the limits
  from the kernel

Rationale: on sysvinit or runit systems, pid 1 is very simple and is
unlikely to need to elevate any limits, but sysadmins are expected
to restart system services in an unpredictable execution environment
(certainly true for systemd, I'm not so sure for runit). On systemd
systems, pid 1 is more complex, but part of the value we get for that
complexity is that even when sysadmins restart system services, the
service receives a known and predictable execution environment, so it
does not need to be robust against inheriting a wrong rlimit or other
parameters.

See also #917374, #976373, #923312.

The reason I ask about this is that I want to make sure