Package: libnvme1
Version: 1.3-1
Severity: important
Tags: patch

Dear Maintainer,

libnvme has a serious bug that, on some NVMe hardware, can trigger DMA
writes that overwrite memory of unrelated processes, resulting in random
crashes and other system stability issues.  This can be caused by simply
running `nvme list`.

This was very recently fixed upstream in
https://github.com/linux-nvme/libnvme/commit/a2b8e52e46cfd888ac5a48d8ce632bd70a5caa93
and
https://github.com/linux-nvme/libnvme/commit/68c6ffb11d40a427fc1fd70ac2ac97fd01952913.

I've been able to reproduce this in multiple systems that have
SKHynix_HFS256GD9TNI-L2B0B SSDs.  From recent commit descriptions in
libnvme and nvme-cli, it sounds like some NVMe devices DMA only in 4k
blocks, but libnvme would sometimes allocate a smaller buffer.  Which
can result in the DMA operation clobbering unrelated memory.

To reproduce:

1. Make sure the kernel isn't using IOMMU (e.g., boot with
intel_iommu=off).
2. Run `while nvme list; do sleep 0.1; done`.

Generally the nvme process will segfault or abort with an error within a
very small number of iterations.  Example dmesg output when this
happens:

[ 2238.591144] show_signal_msg: 6 callbacks suppressed
[ 2238.591150] nvme[1315]: segfault at 8 ip 00007fbf286748e9 sp
00007ffe4cbccb30 error 4 in libc.so.6[7fbf28603000+155000] likely on CPU
1 (core 1, socket 0)
[ 2238.591178] Code: 24 18 45 85 d2 0f 85 17 05 00 00 48 81 fb ff 03 00
00 76 20 43 8d 44 2d 0c 48 8d 44 c5 00 48 8b 10 48 8d 48 f0 48 39 ca 74
0a <48> 39 5a 08 0f 83 2b 05 00 00 41 8d 4d 01 43 8d 44 2d 0e 89 cf 48

If you keep running this, you'll also find that other processes start
crashing as well, usually with segfaults or weird shared library
failures.  I've seen sshd crash, firefox crash, systemd segfault, etc.
As an example, I recently saw sshd failing with this error:

Oct 26 19:46:27 challenger sshd[1361]: /usr/sbin/sshd: error while
loading shared libraries: /lib/x86_64-linux-gnu/libnsl.so.2: unexpected
PLT reloc type 0x00

I was able to trivially apply the two git commits listed above to
libnvme 1.3 in Bookworm, and this resolved the crash and memory
corruption caused by `nvme list`.  I'd recommend applying these
changes to libnvme in Bookworm, since the impact is pretty severe for
users who happen to own affected devices.

There have also been other recent memory alignment changes in libnvme
and nvme-cli.  It may be worth trying to backport more of these to
the Bookworm packages to avoid memory corruption during other nvme
operations.


-- System Information:
Debian Release: 12.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-13-amd64 (SMP w/6 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libnvme1 depends on:
ii  libc6        2.36-9+deb12u3
ii  libdbus-1-3  1.14.10-1~deb12u1
ii  libjson-c5   0.16-2
ii  libssl3      3.0.11-1~deb12u2

libnvme1 recommends no packages.

Versions of packages libnvme1 suggests:
ii  nvme-cli  2.4+really2.3-3

-- no debconf information

Reply via email to