Having been impressed by the current status of reproducible builds and the fact it looks like we're close to having the important pieces in Debian proper, I have started to have a look at how I could help out with this bug. I've done some poking around in the dak code, and think I have a vague idea of how to achieve what I think is wanted.
First, it is helpful to describe what I think is wanted. What I think we need is the archive network to have, alongside the binary packages it contains, details of exactly how to build those binaries. This is, I believe, the information contained in the .buildinfo files. This bug has previously talked about a tarball of .buildinfo files, presented as Buildinfos.tgz alongside the Packages file. From looking at the current architecture of dak I do not believe that this is an easy option. I propose instead a Buildinfo.xz (or gz or whatever) file, which is single text file with containing all of the buildinfo information that corresponds to the Packages list. What is lost by this approach are the OpenPGP signatures that .buildinfo files can have on them. I appreciate this is an important part of the reproducible builds aim, but I believe one of its strengths is the ability for multiple separate package builds to attest that they have used that buildinfo information to build the exact same set of binary artefacts. This is not something that easily scales on the archive network and I think it is better served by a separate service; it would be possible to take the package snippet from the buildinfo file and sign that alone, uploading the signature to the attestation service. For "normal" Debian operation the usual archive signatures would provide a basic level of attestation of chain of build information. The rest of this mail continues on the above assumptions. If you do not agree with the above the below is probably null and void, so ignore it and instead educate me about what the requirements are and I'll try and adjust my ideas based on that. So. If a single Buildinfo.xz file is acceptable, with the attestation being elsewhere, I think this is doable without too much hackery in dak. There are some trade-offs to make though, and I need to check which are acceptable and which are viewed as too much. Firstly, there is currently no concept of "build ids" that I can see; essentially the primary key for a build is (source-package, architecture, version). This assumes we never have the same version of a package with different binaries produced; I understand there is sometimes skew between security + the main archive but it's not clear to me if this will continue to be the case when we're doing things reproducibly. Even if it's not adding a simple build id doesn't actually help AFAICT. Secondly, buildinfo files that I've seen so far include arch all .debs with the architecture .debs. I believe on the archive these should be separate; so a build + upload that includes arch all + arch amd64 (for example) debs will actually end up with an entry (for just the all debs) in the all Buildinfo.xz and an entry (for just the amd64 debs) in the amd64 Buildinfo.xz. Why? Binary NMUs, which don't rebuild the all .debs. Otherwise you end up changing the buildinfo information (to drop the rebuild amd64 debs) or keeping around old buildinfo information (+ you have to track the fact you need it and know when to clean it up). Thirdly, as the information is generated from a database, there needs to be a defined order in which the fields are generated. This is purely to ensure that the buildinfo information for each package is generated in a reproducible fashion so any external signatures remain valid over time. If these are acceptable I think that projectb needs 2 additional tables, buildinfo_keys, similar to metadata_keys, and binaries_buildinfo, which would have a 3 column primary key of (source-package, architecture, version), and then key_id/value fields (similar to binaries_metadata) to hold the buildinfo information that is not already present elsewhere in the database. At present the main information these will hold is Installed-Build-Depends field - the rest that I've actively seen are available already. Have I missed anything? I don't think the code to implement the above ends up particularly complex in dak, and the resulting Buildinfo.xz files should not add a particularly large amount of new data to the mirror network. The main loss is that of the attestation information as part of the mirror network (and actually, I can see a way we could add that as a buildinfo field that wasn't part of the signature at some point in the future). (Additionally it is not clear to me where the dpkg status for buildinfo creation is; I have heard that it's close to happening, but I can't find anything on recent list archives about it - pointers appreciated!) J. -- /-\ | I get the feeling that I've been |@/ Debian GNU/Linux Developer | cheated. \- |