On Thu, 6 Jul 2023 at 09:42, Trent W. Buck <trentb...@gmail.com> wrote:
>
> "Trent W. Buck" <trentb...@gmail.com> writes:
> > e.g. I expect "SystemCallArchitectures=native" to break for a lot of
> > people (anyone doing dpkg --add-architecture)
>
> Short version:
>
>   • SystemCallArchitectures=native + debianutils:i386 doesn't break 
> dpkg-db-backup.service.
>   • Probably savelog simply never calls an ARCHITECTURE-SPECIFIC syscall.
>
>   • SystemCallArchitectures=native + nginx:i386 DOES break nginx.service.
>   • Neither journalctl nor coredumpctl makes it obvious this is WHY nginx 
> crashed.
>
>
> Boring detailed version follows.
>
> I tried to trigger this (SystemCallArchitectures=native vs. dpkg 
> --add-architecture) just now, and I can't!
>
> On an amd64 Debian 12 VM, I tried
>
>     dpkg --add-architecture i386
>     apt update
>     apt install --allow-remove-essential debianutils:i386 debianutils:amd64-
>     systemctl edit dpkg-db-backup
>
>        # Adding these:
>        [Service]
>        ReadWritePaths=/var/backups
>        CapabilityBoundingSet=
>        NoNewPrivileges=yes
>        PrivateDevices=yes
>        ProtectClock=yes
>        ProtectKernelLogs=yes
>        ProtectControlGroups=yes
>        ProtectKernelModules=yes
>        SystemCallArchitectures=native
>
>     systemctl start dpkg-db-backup
>     systemctl status dpkg-db-backup
>
> It seems to be running savelog:i386 happily.
>
> Then I tried a completely alien architecture,
> in case i386-on-amd64 was somehow special:
>
>     dpkg --add-architecture arm64
>     apt update
>     apt install mg:arm64 qemu-user-static
>     systemctl edit dpkg-db-backup
>
>        # Adding these:
>        [Service]
>        ExecStart=
>        ExecStart=mg tmp.txt
>        [Service]
>        ReadWritePaths=/var/backups
>        CapabilityBoundingSet=
>        NoNewPrivileges=yes
>        PrivateDevices=yes
>        ProtectClock=yes
>        ProtectKernelLogs=yes
>        ProtectControlGroups=yes
>        ProtectKernelModules=yes
>        SystemCallArchitectures=native
>
>     systemctl start dpkg-db-backup
>     systemctl status dpkg-db-backup
>
>        mg[1552]: panic: standard input and output must be a terminal
>
> And that worked (in the sense that systemd ran mg enough for it to call 
> printf).
>
> I also thought that it might not work in linux-image-cloud-amd64, so
> I switched to linux-image-amd64, but
> it didn't seem to help -- systemd wasn't blocking things.
>
> The main "user story" for SystemCallArchitectures=native is
> if an attacker replaces (say) /bin/sh with a compromised binary.
> Usually they use i386, so it works on both i386 and amd64 systems.
> So if you do SystemCallArchitectures=native on amd64, it SHOULD just go
> "haha no, this is i386, piss off".
>
> Ah OK, on rereading the manpage,
> https://manpages.debian.org/bookworm/systemd/systemd.exec.5.en.html#SystemCallArchitectures=
> it seems like this just blocks non-amd64 syscalls.
> So I guess a program like savelog doesn't trigger it, because
> it's so simple it never hits an architecture-specific syscall?
>
> Also (probably) when mg:arm64 transits through qemu-user-static,
> by the time the enforcing layer sees it, the syscalls are native amd64 
> syscalls.
>
> Let's test a more complicated program, like nginx:i386...
> OK, I can make that fail.  Phew!  I thought I was going mad.
>
>     root@main:~# systemctl show -p SystemCallArchitectures nginx
>     SystemCallArchitectures=native
>
>     root@main:~# systemctl start nginx
>     Job for nginx.service failed because a fatal signal was delivered causing 
> the control process to dump core.
>     See "systemctl status nginx.service" and "journalctl -xeu nginx.service" 
> for details.
>
>     root@main:~# systemctl status nginx
>     × nginx.service - A high performance web server and a reverse proxy server
>          Loaded: loaded (/lib/systemd/system/nginx.service; enabled; preset: 
> enabled)
>         Drop-In: /etc/systemd/system/nginx.service.d
>                  └─hardening.conf
>          Active: failed (Result: core-dump) since Thu 2023-07-06 18:32:40 
> AEST; 3s ago
>        Duration: 2min 32.918s
>            Docs: man:nginx(8)
>         Process: 2919 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; 
> master_process on; (code=dumped, signal=SYS)
>             CPU: 2ms
>
>     Jul 06 18:32:40 main.lan systemd[1]: Starting nginx.service - A high 
> performance web server and a reverse proxy server...
>     Jul 06 18:32:40 main.lan systemd[1]: nginx.service: Control process 
> exited, code=dumped, status=31/SYS
>     Jul 06 18:32:40 main.lan systemd[1]: nginx.service: Failed with result 
> 'core-dump'.
>     Jul 06 18:32:40 main.lan systemd[1]: Failed to start nginx.service - A 
> high performance web server and a reverse proxy server.
>
>     root@main:~# coredumpctl
>     TIME                          PID UID GID SIG    COREFILE EXE             
>  SIZE
>     Thu 2023-07-06 18:32:40 AEST 2919   0   0 SIGSYS present  /usr/sbin/nginx 
> 27.1K
>
>     root@main:~# coredumpctl info
>                PID: 2919 (nginx)
>                UID: 0 (root)
>                GID: 0 (root)
>             Signal: 31 (SYS)
>          Timestamp: Thu 2023-07-06 18:32:40 AEST (13s ago)
>       Command Line: /usr/sbin/nginx -t -q -g $'daemon on; master_process on;'
>         Executable: /usr/sbin/nginx
>      Control Group: /system.slice/nginx.service
>               Unit: nginx.service
>              Slice: system.slice
>            Boot ID: 8ee087fb77d9486d82ac6457ee7568ff
>         Machine ID: e6ee154bf2474fc9ab9b193c672b5f5c
>           Hostname: main.lan
>            Storage: 
> /var/lib/systemd/coredump/core.nginx.0.8ee087fb77d9486d82ac6457ee7568ff.2919.1688632360000000.zst
>  (present)
>       Size on Disk: 27.1K
>            Message: Process 2919 (nginx) of user 0 dumped core.
>
> Normally there would be a backtrace in coredumpctl's output,
> indicating the last few syscalls it made before it made a blocked syscall.
> I'm not sure why that's absent here, but it makes it very hard to go

Seccomp is complex, because it is very granular. When there is no
sandboxing at all, I recommend to start from the simplest things such
as ProtectSystem and PrivateTmp/TemporaryFilesystems. Having some
sandboxing is better than having no sandboxing.

Kind regards,
Luca Boccassi

Reply via email to