On Thu, 6 Jul 2023 at 09:42, Trent W. Buck <trentb...@gmail.com> wrote: > > "Trent W. Buck" <trentb...@gmail.com> writes: > > e.g. I expect "SystemCallArchitectures=native" to break for a lot of > > people (anyone doing dpkg --add-architecture) > > Short version: > > • SystemCallArchitectures=native + debianutils:i386 doesn't break > dpkg-db-backup.service. > • Probably savelog simply never calls an ARCHITECTURE-SPECIFIC syscall. > > • SystemCallArchitectures=native + nginx:i386 DOES break nginx.service. > • Neither journalctl nor coredumpctl makes it obvious this is WHY nginx > crashed. > > > Boring detailed version follows. > > I tried to trigger this (SystemCallArchitectures=native vs. dpkg > --add-architecture) just now, and I can't! > > On an amd64 Debian 12 VM, I tried > > dpkg --add-architecture i386 > apt update > apt install --allow-remove-essential debianutils:i386 debianutils:amd64- > systemctl edit dpkg-db-backup > > # Adding these: > [Service] > ReadWritePaths=/var/backups > CapabilityBoundingSet= > NoNewPrivileges=yes > PrivateDevices=yes > ProtectClock=yes > ProtectKernelLogs=yes > ProtectControlGroups=yes > ProtectKernelModules=yes > SystemCallArchitectures=native > > systemctl start dpkg-db-backup > systemctl status dpkg-db-backup > > It seems to be running savelog:i386 happily. > > Then I tried a completely alien architecture, > in case i386-on-amd64 was somehow special: > > dpkg --add-architecture arm64 > apt update > apt install mg:arm64 qemu-user-static > systemctl edit dpkg-db-backup > > # Adding these: > [Service] > ExecStart= > ExecStart=mg tmp.txt > [Service] > ReadWritePaths=/var/backups > CapabilityBoundingSet= > NoNewPrivileges=yes > PrivateDevices=yes > ProtectClock=yes > ProtectKernelLogs=yes > ProtectControlGroups=yes > ProtectKernelModules=yes > SystemCallArchitectures=native > > systemctl start dpkg-db-backup > systemctl status dpkg-db-backup > > mg[1552]: panic: standard input and output must be a terminal > > And that worked (in the sense that systemd ran mg enough for it to call > printf). > > I also thought that it might not work in linux-image-cloud-amd64, so > I switched to linux-image-amd64, but > it didn't seem to help -- systemd wasn't blocking things. > > The main "user story" for SystemCallArchitectures=native is > if an attacker replaces (say) /bin/sh with a compromised binary. > Usually they use i386, so it works on both i386 and amd64 systems. > So if you do SystemCallArchitectures=native on amd64, it SHOULD just go > "haha no, this is i386, piss off". > > Ah OK, on rereading the manpage, > https://manpages.debian.org/bookworm/systemd/systemd.exec.5.en.html#SystemCallArchitectures= > it seems like this just blocks non-amd64 syscalls. > So I guess a program like savelog doesn't trigger it, because > it's so simple it never hits an architecture-specific syscall? > > Also (probably) when mg:arm64 transits through qemu-user-static, > by the time the enforcing layer sees it, the syscalls are native amd64 > syscalls. > > Let's test a more complicated program, like nginx:i386... > OK, I can make that fail. Phew! I thought I was going mad. > > root@main:~# systemctl show -p SystemCallArchitectures nginx > SystemCallArchitectures=native > > root@main:~# systemctl start nginx > Job for nginx.service failed because a fatal signal was delivered causing > the control process to dump core. > See "systemctl status nginx.service" and "journalctl -xeu nginx.service" > for details. > > root@main:~# systemctl status nginx > × nginx.service - A high performance web server and a reverse proxy server > Loaded: loaded (/lib/systemd/system/nginx.service; enabled; preset: > enabled) > Drop-In: /etc/systemd/system/nginx.service.d > └─hardening.conf > Active: failed (Result: core-dump) since Thu 2023-07-06 18:32:40 > AEST; 3s ago > Duration: 2min 32.918s > Docs: man:nginx(8) > Process: 2919 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; > master_process on; (code=dumped, signal=SYS) > CPU: 2ms > > Jul 06 18:32:40 main.lan systemd[1]: Starting nginx.service - A high > performance web server and a reverse proxy server... > Jul 06 18:32:40 main.lan systemd[1]: nginx.service: Control process > exited, code=dumped, status=31/SYS > Jul 06 18:32:40 main.lan systemd[1]: nginx.service: Failed with result > 'core-dump'. > Jul 06 18:32:40 main.lan systemd[1]: Failed to start nginx.service - A > high performance web server and a reverse proxy server. > > root@main:~# coredumpctl > TIME PID UID GID SIG COREFILE EXE > SIZE > Thu 2023-07-06 18:32:40 AEST 2919 0 0 SIGSYS present /usr/sbin/nginx > 27.1K > > root@main:~# coredumpctl info > PID: 2919 (nginx) > UID: 0 (root) > GID: 0 (root) > Signal: 31 (SYS) > Timestamp: Thu 2023-07-06 18:32:40 AEST (13s ago) > Command Line: /usr/sbin/nginx -t -q -g $'daemon on; master_process on;' > Executable: /usr/sbin/nginx > Control Group: /system.slice/nginx.service > Unit: nginx.service > Slice: system.slice > Boot ID: 8ee087fb77d9486d82ac6457ee7568ff > Machine ID: e6ee154bf2474fc9ab9b193c672b5f5c > Hostname: main.lan > Storage: > /var/lib/systemd/coredump/core.nginx.0.8ee087fb77d9486d82ac6457ee7568ff.2919.1688632360000000.zst > (present) > Size on Disk: 27.1K > Message: Process 2919 (nginx) of user 0 dumped core. > > Normally there would be a backtrace in coredumpctl's output, > indicating the last few syscalls it made before it made a blocked syscall. > I'm not sure why that's absent here, but it makes it very hard to go
Seccomp is complex, because it is very granular. When there is no sandboxing at all, I recommend to start from the simplest things such as ProtectSystem and PrivateTmp/TemporaryFilesystems. Having some sandboxing is better than having no sandboxing. Kind regards, Luca Boccassi