On 2026/1/19 15:53, Gao Xiang wrote:


On 2026/1/19 15:29, Christoph Hellwig wrote:
On Sat, Jan 17, 2026 at 12:21:16AM +0800, Gao Xiang wrote:
Hi Christoph,

On 2026/1/16 23:46, Christoph Hellwig wrote:
I don't really understand the fingerprint idea.  Files with the
same content will point to the same physical disk blocks, so that
should be a much better indicator than a finger print?  Also how does

Page cache sharing should apply to different EROFS
filesystem images on the same machine too, so the
physical disk block number idea cannot be applied
to this.

Oh.  That's kinda unexpected and adds another twist to the whole scheme.
So in that case the on-disk data actually is duplicated in each image
and then de-duplicated in memory only?  Ewwww...

On-disk deduplication is decoupled from this feature:

Of course, first of all:

 - Data within a single EROFS image is deduplicated of
   course (for example, erofs supports extent-based
   chunks);


- EROFS can share the same blocks in blobs (multiple
devices) among different images, so that on-disk data

  This way is like docker layers, common data/layers
can be kept in seperate blobs;

can be shared by refering the same blobs;

Both deduplication ways above will be applied to the
golden images which will be transfered on the wire.


- On-disk data won't be deduplicated in image if reflink
is enabled for backing fses, userspace mounters can
trigger background GCs to deduplicate the identical
blocks.

And this way is applied at runtime if underlayfs
supports reflink.


I just tried to say EROFS doesn't limit what's
the real meaning of `fingerprint` (they can be serialized
integer numbers for example defined by a specific image
publisher, or a specific secure hash.  Currently,
"mkfs.erofs" will generate sha256 for each files), but
left them to the image builders:


1) if `fingerprint` is distributed as on-disk part of
signed images, as I said, it could be shared within a
trusted domain_id (usually the same image builder) --
that is the top priority thing using dmverity;

Or

2) If `fingerprint` is not distributed in the image
or images are untrusted (e.g. unknown signatures),
image fetchers can scan each inode in the golden
images to generate an extra minimal EROFS
metadata-only image with local calculated
`fingerprint` too, which is much similar to the
current ostree way (parse remote files and calculate
digests).

Thanks,
Gao Xiang


Reply via email to