Bug#1055039: redis-server: Crash every two hours (oom), seemingly due to systemd's ProcSubset=pid

Arnaud Rebillout Sun, 29 Oct 2023 18:45:16 -0700

Package: redis-server
Version: 5:7.0.11-1
Severity: important
User: de...@kali.org
Usertags: origin-kali


Dear Maintainer,

After migrating an instance from bullseye to bookworm, redis started to
crash every 2 hours. I tracked it to a change in the system unit file,
the value ProcSubset=pid is what causes the issue.

Long story below.

Part of the Kali Linux infrastructure, we have a host that runs
mirrorbits, a geo redirector. Mirrorbits stores its data in a redis
database.

Some quick numbers:

```
# free -h
         total  used  free  shared  buff/cache  available
Mem:      61Gi  24Gi  13Gi   612Ki        22Gi       37Gi
Swap:       0B    0B    0B

# redis-cli info | grep -E '(used_memory_(peak_)?human)'
used_memory_human:22.94G
used_memory_peak_human:23.24G
```

This instance is managed with Ansible. Not in producion yet.

It was running fine with Debian bullseye, then we re-deployed it on a
bookworm VM. On this new host, Redis crashes every two hours roughly:

```
# journalctl | grep redis | grep code=killed | tail
Oct 28 14:58:30 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 16:44:24 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 18:49:49 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 21:07:28 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
Oct 28 22:54:32 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
Oct 29 00:39:06 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
Oct 29 02:43:30 host systemd[1]: redis.service: Main process exited, 
code=killed, status=11/SEGV
```

Looking at the Redis log files, we see this kind of line, repeated
hundreds of times within a few seconds, before Redis finally crashes:

```
85555:M 15 Oct 2023 01:55:55.811 # Out Of Memory allocating 24576 bytes!
```

First thing I did was to disable the RDB snapshot (set every hours in
our config), just to make sure it was not related. It was not, Redis
kept crashing.

Redis RAM usage is rather constant for us (from 22G to 24G), and on this
machine there's only mirrorbits+redis running. There's plenty of RAM
available, I monitored the RAM during a crash, and `free` reports around
37G of RAM available. So I don't think we're running out of RAM.

I checked what changed in the Redis package, between bullseye and
bookworm, and this commit stands out:

d/redis-server.service: harden systemd service file
https://salsa.debian.org/lamby/pkg-redis/-/commit/8fec88c1

I tried to revert the systemd unit file to the bullseye version, and
Redis worked again, no more crash. From there I re-enabled the changes
one by one, until I found the setting that causes the crash:

ProcSubset=pid

I'm not really knowledgeable regarding systemd hardening, but after
reading the doc, it seems pretty clear that this setting is
questionable, and probably shouldn't be enabled.

Quoting systemd.exec(5)`:

If "pid", all files and directories not directly associated with process
management and introspection are made invisible in the /proc/ file
system configured for the unit's processes. [...] Note that Linux
exposes various kernel APIs via /proc/, which are made unavailable with
this setting. Since these APIs are used frequently this option is useful
only in a few, specific cases, and is not suitable for most non-trivial
programs.

At this point I think there's enough information to support disabling
ProcSubset=pid. Please tell me if you need more information, since the
issue is reproducible, it's easy for me to provide more logs.

Thanks in advance!

Arnaud

Bug#1055039: redis-server: Crash every two hours (oom), seemingly due to systemd's ProcSubset=pid

Reply via email to