On 9/1/25 12:26, James Gritton wrote:
On 2025-08-31 20:27, Kyle Evans wrote:

An obvious elephant in the room here is filesystem access.  A capsule would 
force an attacker to get
a little more creative if they want to tamper with capsule processes, in 
particular if it's combined
with a heightened securelevel (or removal of other features like /dev/mem 
entirely), but it does not
stop an attacker from filesystem tampering to disrupt capsule activities.  This 
kind of leaves a huge
part of protecting itself up to application design, which arguably eliminates 
many benefits of the idea.

I don't really have a good answer for how one might solve that.  The rest of 
the design is fairly
straightforward to implement, but I would rather suspect it might get hairy if 
you try to block off parts
of the filesystem (even from root, maybe contingent on securelevel) based on 
whether the path has been
used for a capsule or not.

I'm not quite seeing the use case, which may obvious this, but the
only safe filesystem setup is the jail having its own filesystem.  It
could have other filesystems mounted, but no nullfs, unionfs, or nfs.
The jail root could be protected similarly to the processes - outside
unprivileged processes can't see past it, bascially a virtual 700
permission from outside even if it's something more traditional like
755 as seen inside the jail.


re: use case, I'm thinking of capsules as mostly useful from the perspective of 
an appliance designed with
discrete systems that want to run with some level of isolation from other parts 
of the system.  One obvious
improvement we can make is to remove the ability to interact with the processes 
inside via simple syscalls:
no more ptrace, no traditional jail_attach(2), no signalling.

This is clearly not total isolation (and that's not a word I'd use in a 
general-public description at all),
but it suddenly forces your attackers to get think a bit harder: if I can't 
disrupt or peek at the black
box (which is presumably running critical services) held by that capsule 
through the obvious traditional
means, what *can* I do?  Can I do something with /dev/mem?  Is the kernel 
debugger available?  Is it easier
to find some other access point to compromise the service if I can only access 
it through whatever control
socket the appliance might have setup or, e.g., ssh/http/https/service it's 
running? Am I stuck with some
side-channels to investigate inside?

Let's look at it from another angle: if I manage to compromise a service that's 
running inside a capsule,
how does this kind of trivial isolation affect my life?  Can I identify 
processes that have been attached
to my jail and somehow use that to extract some useful information carried over 
from pre-attach?  Can I find
some other conduit back up to the parent through things designed to attach and 
admin a jail?  Granted, current
tooling isn't designed in a way to open up any possibilities there, but these 
are the kinds of things I will
be wanting to avoid having to spend any time considering (whether it's a thing 
that makes sense in FreeBSD or
not; my local version already has plenty of diff, this one won't be much to 
slap on top).

re: having its own filesystem, good point.  Requiring the capsule path to be a 
mountpoint and disallowing
mounting over it / traversing past it for processes outside of the capsule is 
maybe not super-invasive.  I'd
probably still allow other filesystems to be mounted inside prior to sealing 
for cases where one knows what
they're doing, but I wouldn't expect that to be a common need.

- Jamie
Thanks,

Kyle Evans




Reply via email to