On 2023-09-20, Lucas Nussbaum wrote: > On 19/09/23 at 13:52 -0700, Vagrant Cascadian wrote: >> * Looking forward and backwards at snapshots >> >> I do think that a more complete snapshot approach is probably better >> than package-specific snapshots, and it might be worth doing >> forward-looking snapshots of ftp.debian.org (and security.debian.org and >> incoming.debian.org), in addition to trying to fill out all the missing >> past snapshots to be able to attempt verification builds of older >> packages, such as all of bookworm. >> >> Snapshotting the archive(s) multiple times per day, today, tomorrow, and >> going forward will at least enable doing verification rebuilds of >> packages starting from this point, with less immediate overhead than >> trying to replicate the entire functionality or more complete history of >> snapshot.debian.org.
In the meantime, I worked on a naive implementation of this, using debmirror and btrfs snapshots (zfs or xfs are other likely candidates for filesystem-level snapshots). It is working better than I expected! It currently has snapshots for debian amd64 on bookworm, bookworm-backports, bookworm-proposed-updates, trixie, sid and experimental (or I guess, rc-buggy...), and debian-security for bookworm-security, and this might be a little redundant, but just in case, also incoming.debian.org for most of the above codenames as well starting between september 20th and 22nd (with some gaps as I was sorting out what was worth capturing; currently does not include debian-installer images, for example, and some generations missed .udebs). Soon it will start capturing October, and beyond! The machine it is running on happens to be very close to a debian mirror, which is helpful! It also seems to have caught some snapshot generations that snapshot.debian.org missed! I also tried to backfill out some snapshots from snapshot.debian.org for "debian" and "debian-security" for roughly the same codenames, with more success than I expected, capturing all of september and edging into august so far. Hope to get as far as maybe june, so that anything built since the bookworm release can has relevent snapshots. It mostly works, although once and a while I appear to trip some download limits and it stalls out. Currently weighing in at about 550GB, each snapshot of the archive for amd64+all+source is weighing in under 330GB if I recall correctly... so that is over a month worth of snapshots for the cost of about two full snapshots. Obviously, adding more architectures would dramatically increase the space used (Would probably add arm64, armhf, i386, ppc64el and riscv64 if I were to do this again). I'm in the process of using this snapshot mirror calling out to grep-dctrl and dose-builddebcheck (look mom, no database!) to generate apt sources.list entries pointing to the appropriate snapshots for each .buildinfo from september, and eventually perform verification builds for each of these. I think it covers roughly 6000 .buildinfo files, which is not nothing! >> I wonder if having multiple snapshot.debian.org implementations might >> actually be a desireable thing, as it is so essential to the ability to >> do long-term reproducible builds verification builds, and having >> additional independent snapshots could provide redundancy and the >> ability to repair breakages if one of the services fails in some way. > > What is the state of efforts regarding alternate snapshot.d.o > implementations? The main one I was aware of: https://github.com/fepitre/debian-snapshot I believe snapshot.reproducible-builds.org which used this is currently on hiatus, but I hope see that picked up again in 2024, possibly with a different implementation... > Has someone explored an implementation backed by S3-compatible storage, > which would easily allow hosting it in a cloud? No idea, but multiple options would be good! Would probably want to use a lot of redundancy (multiple S3 providers, multiple "local" mirrors, etc.), just because this sort of thing is so difficult to fix retroactively (if possible at all)... How difficult is it to implement deduplication with S3 storage? Saw a few hits with a quick search... live well, vagrant
signature.asc
Description: PGP signature