Is it practical and desirable to make Btrfs based OS installation images reproducible? Or is Btrfs simply too complex and non-deterministic? [1]
The main three problems with Btrfs right now for reproducibility are: a. many objects have uuids other than the volume uuid; and mkfs only lets us set the volume uuid b. atime, ctime, mtime, otime; and no way to make them all the same c. non-deterministic allocation of file extents, compression, inode assignment, logical and physical address allocation I'm imagining reproducible image creation would be a mkfs feature that builds on Btrfs seed and --rootdir concepts to constrain Btrfs features to maybe make reproducible Btrfs volumes possible: - No raid - Either all objects needing uuids can have those uuids specified by switch, or possibly a defined set of uuids expressly for this use case, or possibly all of them can just be zeros (eek? not sure) - A flag to set all times the same - Possibly require that target block device is zero filled before creation of the Btrfs - Possibly disallow subvolumes and snapshots - Require the resulting image is seed/ro and maybe also a new compat_ro flag to enforce that such Btrfs file systems cannot be modified after the fact. - Enforce a consistent means of allocation and compression The end result is creating two Btrfs volumes would yield image files with matching hashes. If I had to guess, the biggest challenge would be allocation. But it's also possible that such an image may have problems with "sprouts". A non-removable sprout seems fairly straightforward and safe; but if a "reproducible build" type of seed is removed, it seems like removal needs to be smart enough to refresh *all* uuids found in the sprout: a hard break from the seed. Competing file systems, ext4 with make_ext4 fork, and squashfs. At the moment I'm thinking it might be easier to teach squashfs integrity checking than to make Btrfs reproducible. But then I also think restricting Btrfs features, and applying some requirements to constrain Btrfs to make it reproducible, really enhances the Btrfs seed-sprout feature. Any thoughts? Useful? Difficult to implement? Squashfs might be a better fit for this use case *if* it can be taught about integrity checking. It does per file checksums for the purpose of deduplication but those checksums aren't retained for later integrity checking. [1] problems of reproducible system images https://reproducible-builds.org/docs/system-images/ [2] purpose and motivation for reproducible builds https://reproducible-builds.org/ [3] who is involved? https://reproducible-builds.org/who/#Qubes%20OS -- Chris Murphy