On 2026/1/19 17:38, Gao Xiang wrote:
On 2026/1/19 17:22, Christoph Hellwig wrote:
On Mon, Jan 19, 2026 at 04:52:54PM +0800, Gao Xiang wrote:
To me this sounds pretty scary, as we have code in the kernel's trust
domain that heavily depends on arbitrary userspace policy decisions.
For example, overlayfs metacopy can also points to
arbitary files, what's the difference between them?
https://docs.kernel.org/filesystems/overlayfs.html#metadata-only-copy-up
By using metacopy, overlayfs can access arbitary files
as long as the metacopy has the pointer, so it should
be a priviledged stuff, which is similar to this feature.
Sounds scary too. But overlayfs' job is to combine underlying files, so
it is expected. I think it's the mix of erofs being a disk based file
But you still could point to an arbitary page cache
if metacopy is used.
system, and reaching out beyond the device(s) assigned to the file system
instance that makes me feel rather uneasy.
You mean the page cache can be shared from other
filesystems even not backed by these devices/files?
I admitted yes, there could be different: but that
is why new mount options "inode_share" and the
"domain_id" mount option are used.
I think they should be regarded as a single super
filesystem if "domain_id" is the same: From the
security perspective much like subvolumes of
a single super filesystem.
And mounting a new filesystem within a "domain_id"
can be regard as importing data into the super
"domain_id" filesystem, and I think only trusted
data within the single domain can be mounted/shared.
Similarly the sharing of blocks between different file system
instances opens a lot of questions about trust boundaries and life
time rules. I don't really have good answers, but writing up the
Could you give more details about the these? Since you
raised the questions but I have no idea what the threats
really come from.
Right now by default we don't allow any unprivileged mounts. Now
if people thing that say erofs is safe enough and opt into that,
it needs to be clear what the boundaries of that are. For a file
system limited to a single block device that boundaries are
pretty clear. For file systems reaching out to the entire system
(or some kind of domain), the scope is much wider.
btw, I think it's indeed to be helpful to get the boundaries (even
from on-disk formats and runtime features).
But I have to clarify that a single EROFS filesystem instance won'
have access to random block device or files.
The backing device or files are specified by users explicitly when
mounting, like:
mount -odevice=blob1,device=blob2,...,device=blobn-1 blob0 mnt
And these devices / files will be opened when mounting at once,
no more than that.
May I ask the difference between one device/file and a group of
given devices/files? Especially for immutable usage.
Thanks,
Gao Xiang