17.09.2019 17:30, Ahmed S. Darwish пишет:
On Tue, Sep 17, 2019 at 08:11:56AM -0400, Theodore Y. Ts'o wrote:
On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
Willy Tarreau - 17.09.19, 07:24:38 CEST:
On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
Well, the patch actually made getrandom() return en error too, but
you seem more interested in the hypotheticals than in arguing
actualities.>
If you want to be safe, terminate the process.

This is an interesting approach. At least it will cause bug reports in
application using getrandom() in an unreliable way and they will
check for other options. Because one of the issues with systems that
do not finish to boot is that usually the user doesn't know what
process is hanging.


I would be happy with a change which changes getrandom(0) to send a
kill -9 to the process if it is called too early, with a new flag,
getrandom(GRND_BLOCK) which blocks until entropy is available.  That
leaves it up to the application developer to decide what behavior they
want.


Yup, I'm convinced that's the sanest option too. I'll send a final RFC
patch tonight implementing the following:

config GETRANDOM_CRNG_ENTROPY_MAX_WAIT_MS
        int
        default 3000
        help
          Default max wait in milliseconds, for the getrandom(2) system
          call when asking for entropy from the urandom source, until
          the Cryptographic Random Number Generator (CRNG) gets
          initialized.  Any process exceeding this duration for entropy
          wait will get killed by kernel. The maximum wait can be
          overriden through the "random.getrandom_max_wait_ms" kernel
          boot parameter. Rationale follows.

          When the getrandom(2) system call was created, it came with
          the clear warning: "Any userspace program which uses this new
          functionality must take care to assure that if it is used
          during the boot process, that it will not cause the init
          scripts or other portions of the system startup to hang
          indefinitely.

          Unfortunately, due to multiple factors, including not having
          this warning written in a scary enough language in the
          manpages, and due to glibc since v2.25 implementing a BSD-like
          getentropy(3) in terms of getrandom(2), modern user-space is
          calling getrandom(2) in the boot path everywhere.

          Embedded Linux systems were first hit by this, and reports of
          embedded system "getting stuck at boot" began to be
          common. Over time, the issue began to even creep into consumer
          level x86 laptops: mainstream distributions, like Debian
          Buster, began to recommend installing haveged as a workaround,
          just to let the system boot.

          Filesystem optimizations in EXT4 and XFS exagerated the
          problem, due to aggressive batching of IO requests, and thus
          minimizing sources of entropy at boot. This led to large
          delays until the kernel's Cryptographic Random Number
          Generator (CRNG) got initialized, and thus having reports of
          getrandom(2) inidifinitely stuck at boot.

          Solve this problem by setting a conservative upper bound for
          getrandom(2) wait. Kill the process, instead of returning an
          error code, because otherwise crypto-sensitive applications
          may revert to less secure mechanisms (e.g. /dev/urandom). We
          __deeply encourage__ system integrators and distribution
          builders not to considerably increase this value: during
          system boot, you either have entropy, or you don't. And if you
          didn't have entropy, it will stay like this forever, because
          if you had, you wouldn't have blocked in the first place. It's
          an atomic "either/or" situation, with no middle ground. Please
          think twice.

          Ideally, systems would be configured with hardware random
          number generators, and/or configured to trust the CPU-provided
          RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
          (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
          should generate cryptographic keys only as late as possible,
          when they are needed, instead of during early boot.  (For
          non-cryptographic use cases, such as dictionary seeds or MIT
          Magic Cookies, other mechanisms such as /dev/urandom or
          random(3) may be more appropropriate.)

Sounds good?

thanks,

--
Ahmed Darwish
http://darwish.chasingpointers.com


This would fail the litmus test that started this thread, re-explained below.

0. Linus applies your patch.
1. A kernel release happens, and it boots fine.
2. Ted Ts'o invents yet another brilliant ext4 optimization, and it gets merged. 3. Somebody discovers that the new kernel kills all his processes, up to and including gnome-session, and that's obviously a regression.
4. Linus is forced to revert (2), nobody wins.

--
Alexander E. Patrakov

Attachment: smime.p7s
Description: Криптографическая подпись S/MIME

Reply via email to