On 9/1/25 12:26, James Gritton wrote:
On 2025-08-31 20:27, Kyle Evans wrote:
An obvious elephant in the room here is filesystem access. A capsule would
force an attacker to get
a little more creative if they want to tamper with capsule processes, in
particular if it's combined
with a heightened securelevel (or removal of other features like /dev/mem
entirely), but it does not
stop an attacker from filesystem tampering to disrupt capsule activities. This
kind of leaves a huge
part of protecting itself up to application design, which arguably eliminates
many benefits of the idea.
I don't really have a good answer for how one might solve that. The rest of
the design is fairly
straightforward to implement, but I would rather suspect it might get hairy if
you try to block off parts
of the filesystem (even from root, maybe contingent on securelevel) based on
whether the path has been
used for a capsule or not.
I'm not quite seeing the use case, which may obvious this, but the
only safe filesystem setup is the jail having its own filesystem. It
could have other filesystems mounted, but no nullfs, unionfs, or nfs.
The jail root could be protected similarly to the processes - outside
unprivileged processes can't see past it, bascially a virtual 700
permission from outside even if it's something more traditional like
755 as seen inside the jail.
re: use case, I'm thinking of capsules as mostly useful from the perspective of
an appliance designed with
discrete systems that want to run with some level of isolation from other parts
of the system. One obvious
improvement we can make is to remove the ability to interact with the processes
inside via simple syscalls:
no more ptrace, no traditional jail_attach(2), no signalling.
This is clearly not total isolation (and that's not a word I'd use in a
general-public description at all),
but it suddenly forces your attackers to get think a bit harder: if I can't
disrupt or peek at the black
box (which is presumably running critical services) held by that capsule
through the obvious traditional
means, what *can* I do? Can I do something with /dev/mem? Is the kernel
debugger available? Is it easier
to find some other access point to compromise the service if I can only access
it through whatever control
socket the appliance might have setup or, e.g., ssh/http/https/service it's
running? Am I stuck with some
side-channels to investigate inside?
Let's look at it from another angle: if I manage to compromise a service that's
running inside a capsule,
how does this kind of trivial isolation affect my life? Can I identify
processes that have been attached
to my jail and somehow use that to extract some useful information carried over
from pre-attach? Can I find
some other conduit back up to the parent through things designed to attach and
admin a jail? Granted, current
tooling isn't designed in a way to open up any possibilities there, but these
are the kinds of things I will
be wanting to avoid having to spend any time considering (whether it's a thing
that makes sense in FreeBSD or
not; my local version already has plenty of diff, this one won't be much to
slap on top).
re: having its own filesystem, good point. Requiring the capsule path to be a
mountpoint and disallowing
mounting over it / traversing past it for processes outside of the capsule is
maybe not super-invasive. I'd
probably still allow other filesystems to be mounted inside prior to sealing
for cases where one knows what
they're doing, but I wouldn't expect that to be a common need.
- Jamie
Thanks,
Kyle Evans