On Tue, Oct 16, 2018 at 2:13 AM, Anand Jain <anand.j...@oracle.com> wrote: > > > On 10/14/2018 06:28 AM, Chris Murphy wrote: >> >> Is it practical and desirable to make Btrfs based OS installation >> images reproducible? Or is Btrfs simply too complex and >> non-deterministic? [1] >> >> The main three problems with Btrfs right now for reproducibility are: >> a. many objects have uuids other than the volume uuid; and mkfs only >> lets us set the volume uuid >> b. atime, ctime, mtime, otime; and no way to make them all the same >> c. non-deterministic allocation of file extents, compression, inode >> assignment, logical and physical address allocation >> >> I'm imagining reproducible image creation would be a mkfs feature that >> builds on Btrfs seed and --rootdir concepts to constrain Btrfs >> features to maybe make reproducible Btrfs volumes possible: >> >> - No raid >> - Either all objects needing uuids can have those uuids specified by >> switch, or possibly a defined set of uuids expressly for this use >> case, or possibly all of them can just be zeros (eek? not sure) >> - A flag to set all times the same >> - Possibly require that target block device is zero filled before >> creation of the Btrfs >> - Possibly disallow subvolumes and snapshots >> - Require the resulting image is seed/ro and maybe also a new >> compat_ro flag to enforce that such Btrfs file systems cannot be >> modified after the fact. >> - Enforce a consistent means of allocation and compression >> >> The end result is creating two Btrfs volumes would yield image files >> with matching hashes. > > >> If I had to guess, the biggest challenge would be allocation. But it's >> also possible that such an image may have problems with "sprouts". A >> non-removable sprout seems fairly straightforward and safe; but if a >> "reproducible build" type of seed is removed, it seems like removal >> needs to be smart enough to refresh *all* uuids found in the sprout: a >> hard break from the seed. > > > Right. The seed fsid will be gone in a detached sprout.
I think already we get a new devid, volume uuid, and device uuid. Open question is whether any other uuid's need to be refreshed, such as chunk uuid since that appears in every node and leaf. >> Any thoughts? Useful? Difficult to implement? > > Recently Nikolay sent a patch to change fsid on a mounted btrfs. However for > a reproducible builds it also needs neutralized uuids, time, bytenr(s) > further more though the ondisk layout won't change without notice but > block-bytenr might. Seems like the mkfs population method of such a seed, could be made very deterministic as to what the start logical address and physical address are. The vast majority of non-deterministic behavior comes from the nature of kernel code having to handle so many complex inputs and outputs, and negotiate them. > One question why not reproducible builds get the file data extents from the > image and stitch the hashes together to verify the hash. And there could be > a vfs ioctl to import and export filesystem images for a better > support-ability of the use-case similar to the reproducible builds. Perhaps. I don't know the reproducible build requirements very well, if all they really care about is the hash of the data extents, and really how important fs metadata is. That is important when it comes to fuzzing file systems that have no metadata checksumming like squashfs; of course you'd have to checksum the whole file system image. Another feature the mkfs variety of seed image would need, deduplication. As far as I know, deduplication is kernel code only. You'd want to be able to deduplicate, as well as compress, to have the smallest distributed seed possible. And mksquashfs does deduplication by default. -- Chris Murphy