Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

2023-01-22 Thread Eric Biggers
Hi Michael,

On Tue, Jan 10, 2023 at 12:50:42PM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 10, 2023 at 04:34:49PM +0100, Jason A. Donenfeld wrote:
> > Hi Michael,
> > 
> > Could you queue up this patch and mark it as a fix for 7.2.1? It is a
> > straight-up bug fix for a 7.2 regression that's now affected several
> > users.
> 
> OK. In the future pls cc me if you want me to merge a patch. Thanks!
> 
> > - It has two Tested-by tags on the thread.
> > - hpa, the maintainer of the kernel side of this, confirmed on one of
> >   the various tributary threads that this approach is a correct one.
> > - It doesn't introduce any new functionality.
> > 
> > For your convenience, you can grab this out of lore here:
> > 
> >   https://lore.kernel.org/lkml/20221230220725.618763-1-ja...@zx2c4.com/
> > 
> > Or if you want to yolo it:
> > 
> >   curl 
> > https://lore.kernel.org/lkml/20221230220725.618763-1-ja...@zx2c4.com/raw | 
> > git am -s
> > 
> > It's now sat silent on the mailing list for a while. So let's please get
> > this committed and backported so that the bug reports stop coming in.
> > 

This patch still isn't on QEMU's master branch.  What happened to it?

- Eric



Re: [PATCH qemu v3] x86: don't let decompressed kernel image clobber setup_data

2023-01-04 Thread Eric Biggers
On Fri, Dec 30, 2022 at 11:07:25PM +0100, Jason A. Donenfeld wrote:
> The setup_data links are appended to the compressed kernel image. Since
> the kernel image is typically loaded at 0x10, setup_data lives at
> `0x10 + compressed_size`, which does not get relocated during the
> kernel's boot process.
> 
> The kernel typically decompresses the image starting at address
> 0x100 (note: there's one more zero there than the compressed image
> above). This usually is fine for most kernels.
> 
> However, if the compressed image is actually quite large, then
> setup_data will live at a `0x10 + compressed_size` that extends into
> the decompressed zone at 0x100. In other words, if compressed_size
> is larger than `0x100 - 0x10`, then the decompression step will
> clobber setup_data, resulting in crashes.
> 
> Visually, what happens now is that QEMU appends setup_data to the kernel
> image:
> 
>   kernel imagesetup_data
>|--|||
> 0x10  0x10+l1 0x10+l1+l2
> 
> The problem is that this decompresses to 0x100 (one more zero). So
> if l1 is > (0x100-0x10), then this winds up looking like:
> 
>   kernel imagesetup_data
>|--|||
> 0x10  0x10+l1 0x10+l1+l2
> 
>  d e c o m p r e s s e d   k e r n e l
>  
> |-|
> 0x100 
> 0x100+l3
> 
> The decompressed kernel seemingly overwriting the compressed kernel
> image isn't a problem, because that gets relocated to a higher address
> early on in the boot process, at the end of startup_64. setup_data,
> however, stays in the same place, since those links are self referential
> and nothing fixes them up.  So the decompressed kernel clobbers it.
> 
> Fix this by appending setup_data to the cmdline blob rather than the
> kernel image blob, which remains at a lower address that won't get
> clobbered.
> 
> This could have been done by overwriting the initrd blob instead, but
> that poses big difficulties, such as no longer being able to use memory
> mapped files for initrd, hurting performance, and, more importantly, the
> initrd address calculation is hard coded in qboot, and it always grows
> down rather than up, which means lots of brittle semantics would have to
> be changed around, incurring more complexity. In contrast, using cmdline
> is simple and doesn't interfere with anything.
> 
> The microvm machine has a gross hack where it fiddles with fw_cfg data
> after the fact. So this hack is updated to account for this appending,
> by reserving some bytes.
> 
> Cc: x...@kernel.org
> Cc: Philippe Mathieu-Daudé 
> Cc: H. Peter Anvin 
> Cc: Borislav Petkov 
> Cc: Eric Biggers 
> Signed-off-by: Jason A. Donenfeld 

For what it's worth:

Tested-by: Eric Biggers 

- Eric



Re: [PATCH v5 4/4] x86: re-enable rng seeding via SetupData

2022-12-23 Thread Eric Biggers
Hi Jason,

On Wed, Sep 21, 2022 at 11:31:34AM +0200, Jason A. Donenfeld wrote:
> This reverts 3824e25db1 ("x86: disable rng seeding via setup_data"), but
> for 7.2 rather than 7.1, now that modifying setup_data is safe to do.
> 
> Cc: Laurent Vivier 
> Cc: Michael S. Tsirkin 
> Cc: Paolo Bonzini 
> Cc: Peter Maydell 
> Cc: Philippe Mathieu-Daudé 
> Cc: Richard Henderson 
> Cc: Ard Biesheuvel 
> Acked-by: Gerd Hoffmann 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  hw/i386/microvm.c | 2 +-
>  hw/i386/pc_piix.c | 3 ++-
>  hw/i386/pc_q35.c  | 3 ++-
>  3 files changed, 5 insertions(+), 3 deletions(-)
> 

After upgrading to QEMU 7.2, Linux 6.1 no longer boots with some configs.  There
is no output at all.  I bisected it to this commit, and I verified that the
following change to QEMU's master branch makes the problem go away:

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b48047f50c..42f5b07d2f 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -441,6 +441,7 @@ static void pc_i440fx_8_0_machine_options(MachineClass *m)
 pc_i440fx_machine_options(m);
 m->alias = "pc";
 m->is_default = true;
+PC_MACHINE_CLASS(m)->legacy_no_rng_seed = true;
 }

I've attached the kernel config I am seeing the problem on.

For some reason, the problem also goes away if I disable CONFIG_KASAN.

Any idea what is causing this?

- Eric
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 6.1.0 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (GCC) 12.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=120200
CONFIG_CLANG_VERSION=0
CONFIG_AS_IS_GNU=y
CONFIG_AS_VERSION=23900
CONFIG_LD_IS_BFD=y
CONFIG_LD_VERSION=23900
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
CONFIG_CC_HAS_ASM_GOTO_TIED_OUTPUT=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_NO_PROFILE_FN_ATTR=y
CONFIG_PAHOLE_VERSION=0
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
# CONFIG_WERROR is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_CONTEXT_TRACKING=y
CONFIG_CONTEXT_TRACKING_IDLE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_CLOCKSOURCE_WATCHDOG_MAX_SKEW_US=100
# end of Timers subsystem

CONFIG_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y

#
# BPF subsystem
#
# CONFIG_BPF_SYSCALL is not set
# end of BPF subsystem

CONFIG_PREEMPT_BUILD=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
# CONFIG_SCHED_CORE is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# 

Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned

2022-11-03 Thread Eric Biggers
On Thu, Nov 03, 2022 at 04:26:14PM +, Eric Biggers wrote:
> > In other words, STATX_DIOALIGN is unusable from the start because we
> > don't know whether the information it returns is actually correct? :-/
> 
> That's a silly point of view.  STATX_DIOALIGN has only been in a released 
> kernel
> for a few weeks (v6.0), the bug is only in one edge case, and it will get 
> fixed
> quickly and backported to v6.0.y where users of 6.0 will get it.

Actually, scratch that.  STATX_DIOALIGN is in 6.1, not 6.0.  So it hasn't even
been released yet.  Upstream is currently on v6.1-rc3.

So thank you for reporting (or for not reporting?) this.  We'll make sure it
gets fixed before release :-)

- Eric



Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned

2022-11-03 Thread Eric Biggers
On Thu, Nov 03, 2022 at 10:52:43AM +0100, Kevin Wolf wrote:
> Am 02.11.2022 um 03:49 hat Eric Biggers geschrieben:
> > On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> > > On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > > > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> > > 
> > > Citation needed.  For direct I/O to block devices, the kernel's block 
> > > layer
> > > checks the alignment before the I/O is actually submitted to the 
> > > underlying
> > > block device.  See
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> > > 
> > > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> > > 
> > > That "bug" seems to be based on a misunderstanding of the kernel source 
> > > code,
> > > and not any actual testing.
> > > 
> > > I just tested it, and the error code is EINVAL.
> > > 
> > 
> > I think I see what's happening.  The kernel code was broken just a few 
> > months
> > ago, in v6.0 by the commit "block: relax direct io memory alignment"
> > (https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets 
> > DIO
> > through when the user buffer is only aligned to the device's dma_alignment. 
> >  But
> > a dm-crypt device has a dma_alignment of 512 even when the crypto sector 
> > size
> > (and thus also the logical block size) is 4096.  So there is now a case 
> > where
> > misaligned DIO can reach dm-crypt, when that shouldn't be possible.
> > 
> > It also means that STATX_DIOALIGN will give the wrong value for
> > stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
> > STATX_DIOALIGN for block devices relies on the dma_alignment.
> 
> In other words, STATX_DIOALIGN is unusable from the start because we
> don't know whether the information it returns is actually correct? :-/

That's a silly point of view.  STATX_DIOALIGN has only been in a released kernel
for a few weeks (v6.0), the bug is only in one edge case, and it will get fixed
quickly and backported to v6.0.y where users of 6.0 will get it.

Basically every Linux API has been broken at one point or the other, but things
get fixed.  Direct I/O itself has been totally broken on some filesystems, so by
this argument why are you even using direct I/O?  Actually, why are you even
using Linux?  Just use one of those operating systems with no bugs instead.

Also note that your alternative is probing, which is super broken because many
filesystems fall back to buffered I/O for misaligned direct I/O instead of
returning an error.

So please cooperate with getting things fixed properly instead of continuing to
use broken workarounds.

> > I'll raise this on the linux-block and dm-devel mailing lists.  It
> > would be nice if people reported kernel bugs instead of silently
> > working around them...
> 
> I wasn't involved in this specific one, but in case you're wondering why
> I wouldn't have reported it either...
> 
> On one hand, I agree with you because I want bugs in my code reported,
> too, but on the other hand, it has also happened to me before that
> you're treated as the stupid userland developer who doesn't know how
> the kernel works and who should better have kept his ideas of how it
> should work to himself - which is not exactly encouraging to report
> things when you can just deal with the way they are. I wouldn't have
> been able to tell whether in the mind of the respective maintainers,
> -EINVAL is required behaviour or whether that was just a totally
> unreasonable assumption on our side. Erring on the safe side, I'll give
> up an assumption that obviously doesn't match reality.

The error code is documented in the read(2) and write(2) man pages.  So clearly
either the kernel was wrong or the man page was wrong.  Either way it needed to
be reported.

- Eric



Re: [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support

2022-11-01 Thread Eric Biggers
On Tue, Nov 01, 2022 at 03:00:31PM -0400, Stefan Hajnoczi wrote:
>  /* Let's try to use the logical blocksize for the alignment. */
> -if (probe_logical_blocksize(fd, >bl.request_alignment) < 0) {
> -bs->bl.request_alignment = 0;
> +if (!bs->bl.request_alignment) {
> +if (probe_logical_blocksize(fd, >bl.request_alignment) < 0) {
> +bs->bl.request_alignment = 0;
> +}
>  }
>  
>  #ifdef __linux__
> -/*
> - * The XFS ioctl definitions are shipped in extra packages that might
> - * not always be available. Since we just need the XFS_IOC_DIOINFO ioctl
> - * here, we simply use our own definition instead:
> - */
> -struct xfs_dioattr {
> -uint32_t d_mem;
> -uint32_t d_miniosz;
> -uint32_t d_maxiosz;
> -} da;
> -if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), ) >= 0) {
> -bs->bl.request_alignment = da.d_miniosz;
> -/* The kernel returns wrong information for d_mem */
> -/* s->buf_align = da.d_mem; */
> +if (!bs->bl.request_alignment) {

This patch changes the fallback code to make the request_alignment value from
probe_logical_blocksize() override the value from XFS_IOC_DIOINFO.  Is that
intentional?

> +if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), ) >= 0) {
> +bs->bl.request_alignment = da.d_miniosz;
> +/* The kernel returns wrong information for d_mem */
> +/* s->buf_align = da.d_mem; */

Has this bug been reported to the XFS developers (linux-...@vger.kernel.org)?

- Eric



Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned

2022-11-01 Thread Eric Biggers
On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> 
> Citation needed.  For direct I/O to block devices, the kernel's block layer
> checks the alignment before the I/O is actually submitted to the underlying
> block device.  See
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> 
> > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> 
> That "bug" seems to be based on a misunderstanding of the kernel source code,
> and not any actual testing.
> 
> I just tested it, and the error code is EINVAL.
> 

I think I see what's happening.  The kernel code was broken just a few months
ago, in v6.0 by the commit "block: relax direct io memory alignment"
(https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets DIO
through when the user buffer is only aligned to the device's dma_alignment.  But
a dm-crypt device has a dma_alignment of 512 even when the crypto sector size
(and thus also the logical block size) is 4096.  So there is now a case where
misaligned DIO can reach dm-crypt, when that shouldn't be possible.

It also means that STATX_DIOALIGN will give the wrong value for
stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
STATX_DIOALIGN for block devices relies on the dma_alignment.

I'll raise this on the linux-block and dm-devel mailing lists.  It would be nice
if people reported kernel bugs instead of silently working around them...

- Eric



Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned

2022-11-01 Thread Eric Biggers
On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.

Citation needed.  For direct I/O to block devices, the kernel's block layer
checks the alignment before the I/O is actually submitted to the underlying
block device.  See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306

> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290

That "bug" seems to be based on a misunderstanding of the kernel source code,
and not any actual testing.

I just tested it, and the error code is EINVAL.

- Eric



Re: Re: [PATCH v3 0/6] Support akcipher for virtio-crypto

2022-03-23 Thread Eric Biggers
On Wed, Mar 23, 2022 at 03:32:37PM +0800, zhenwei pi wrote:
> 
> On 3/23/22 13:17, Eric Biggers wrote:
> > On Wed, Mar 23, 2022 at 10:49:06AM +0800, zhenwei pi wrote:
> > > v2 -> v3:
> > > - Introduce akcipher types to qapi
> > > - Add test/benchmark suite for akcipher class
> > > - Seperate 'virtio_crypto: Support virtio crypto asym operation' into:
> > >- crypto: Introduce akcipher crypto class
> > >- virtio-crypto: Introduce RSA algorithm
> > > 
> > > v1 -> v2:
> > > - Update virtio_crypto.h from v2 version of related kernel patch.
> > > 
> > > v1:
> > > - Support akcipher for virtio-crypto.
> > > - Introduce akcipher class.
> > > - Introduce ASN1 decoder into QEMU.
> > > - Implement RSA backend by nettle/hogweed.
> > > 
> > > Lei He (3):
> > >crypto-akcipher: Introduce akcipher types to qapi
> > >crypto: Implement RSA algorithm by hogweed
> > >tests/crypto: Add test suite for crypto akcipher
> > > 
> > > Zhenwei Pi (3):
> > >virtio-crypto: header update
> > >crypto: Introduce akcipher crypto class
> > >virtio-crypto: Introduce RSA algorithm
> > 
> > You forgot to describe the point of this patchset and what its use case is.
> > Like any other Linux kernel patchset, that needs to be in the cover letter.
> > 
> > - Eric
> Thanks Eric for pointing this missing part.
> 
> This feature provides akcipher service offloading capability. QEMU side
> handles asymmetric requests via virtio-crypto devices from guest side, do
> encrypt/decrypt/sign/verify operations on host side, and return the result
> to guest.
> 
> This patchset implements a RSA backend by hogweed from nettle, it works
> together with guest patch:
> https://lkml.org/lkml/2022/3/1/1425

So what is the use case?

- Eric



Re: [PATCH v3 0/6] Support akcipher for virtio-crypto

2022-03-22 Thread Eric Biggers
On Wed, Mar 23, 2022 at 10:49:06AM +0800, zhenwei pi wrote:
> v2 -> v3:
> - Introduce akcipher types to qapi
> - Add test/benchmark suite for akcipher class
> - Seperate 'virtio_crypto: Support virtio crypto asym operation' into:
>   - crypto: Introduce akcipher crypto class
>   - virtio-crypto: Introduce RSA algorithm
> 
> v1 -> v2:
> - Update virtio_crypto.h from v2 version of related kernel patch.
> 
> v1:
> - Support akcipher for virtio-crypto.
> - Introduce akcipher class.
> - Introduce ASN1 decoder into QEMU.
> - Implement RSA backend by nettle/hogweed.
> 
> Lei He (3):
>   crypto-akcipher: Introduce akcipher types to qapi
>   crypto: Implement RSA algorithm by hogweed
>   tests/crypto: Add test suite for crypto akcipher
> 
> Zhenwei Pi (3):
>   virtio-crypto: header update
>   crypto: Introduce akcipher crypto class
>   virtio-crypto: Introduce RSA algorithm

You forgot to describe the point of this patchset and what its use case is.
Like any other Linux kernel patchset, that needs to be in the cover letter.

- Eric



Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Eric Biggers
On Wed, Feb 23, 2022 at 02:12:30PM +0100, Jason A. Donenfeld wrote:
> When a VM forks, we must immediately mix in additional information to
> the stream of random output so that two forks or a rollback don't
> produce the same stream of random numbers, which could have catastrophic
> cryptographic consequences. This commit adds a simple API, add_vmfork_
> randomness(), for that.
> 
> Cc: Theodore Ts'o 
> Cc: Jann Horn 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  drivers/char/random.c  | 58 ++
>  include/linux/random.h |  1 +
>  2 files changed, 59 insertions(+)
> 
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 536237a0f073..29d6ce484d15 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -344,6 +344,46 @@ static void crng_reseed(void)
>   }
>  }
>  
> +/*
> + * This mixes unique_vm_id directly into the base_crng key as soon as
> + * possible, similarly to crng_pre_init_inject(), even if the crng is
> + * already running, in order to immediately branch streams from prior
> + * VM instances.
> + */
> +static void crng_vm_fork_inject(const void *unique_vm_id, size_t len)
> +{
> + unsigned long flags, next_gen;
> + struct blake2s_state hash;
> +
> + /*
> +  * Unlike crng_reseed(), we take the lock as early as possible,
> +  * since we don't want the RNG to be used until it's updated.
> +  */
> + spin_lock_irqsave(_crng.lock, flags);
> +
> + /*
> +  * Also update the generation, while locked, as early as
> +  * possible. This will mean unlocked reads of the generation
> +  * will cause a reseeding of per-cpu crngs, and those will
> +  * spin on the base_crng lock waiting for the rest of this
> +  * operation to complete, which achieves the goal of blocking
> +  * the production of new output until this is done.
> +  */
> + next_gen = base_crng.generation + 1;
> + if (next_gen == ULONG_MAX)
> + ++next_gen;
> + WRITE_ONCE(base_crng.generation, next_gen);
> + WRITE_ONCE(base_crng.birth, jiffies);
> +
> + /* This is the same formulation used by crng_pre_init_inject(). */
> + blake2s_init(, sizeof(base_crng.key));
> + blake2s_update(, base_crng.key, sizeof(base_crng.key));
> + blake2s_update(, unique_vm_id, len);
> + blake2s_final(, base_crng.key);
> +
> + spin_unlock_irqrestore(_crng.lock, flags);
> +}
[...]
> +/*
> + * Handle a new unique VM ID, which is unique, not secret, so we
> + * don't credit it, but we do mix it into the entropy pool and
> + * inject it into the crng.
> + */
> +void add_vmfork_randomness(const void *unique_vm_id, size_t size)
> +{
> + add_device_randomness(unique_vm_id, size);
> + crng_vm_fork_inject(unique_vm_id, size);
> +}
> +EXPORT_SYMBOL_GPL(add_vmfork_randomness);

I think we should be removing cases where the base_crng key is changed directly
besides extraction from the input_pool, not adding new ones.  Why not implement
this as add_device_randomness() followed by crng_reseed(force=true), where the
'force' argument forces a reseed to occur even if the entropy_count is too low?

- Eric



Re: [PATCH RFC v1 1/2] random: add mechanism for VM forks to reinitialize crng

2022-02-23 Thread Eric Biggers
On Thu, Feb 24, 2022 at 01:54:54AM +0100, Jason A. Donenfeld wrote:
> On 2/24/22, Eric Biggers  wrote:
> > I think we should be removing cases where the base_crng key is changed
> > directly
> > besides extraction from the input_pool, not adding new ones.  Why not
> > implement
> > this as add_device_randomness() followed by crng_reseed(force=true), where
> > the
> > 'force' argument forces a reseed to occur even if the entropy_count is too
> > low?
> 
> Because that induces a "premature next" condition which can let that
> entropy, potentially newly acquired by a storm of IRQs at power-on, be
> bruteforced by unprivileged userspace. I actually had it exactly the
> way you describe at first, but decided that this here is the lesser of
> evils and doesn't really complicate things the way an intentional
> premature next would. The only thing we care about here is branching
> the crng stream, and so this does explicitly that, without having to
> interfere with how we collect entropy. Of course we *also* add it as
> non-credited "device randomness" so that it's part of the next
> reseeding, whenever that might occur.

Can you make sure to properly explain this in the code?

- Eric