Re: [Reproducible-builds] Moving towards buildinfo on the archive network
Jonathan McDowell: > On Sat, Aug 20, 2016 at 03:13:00PM +, Ximin Luo wrote: >> I have trouble imagining what could make Buildinfo.tgz hard, but make >> Buildinfo.xz easy - could you explain this in more detail, please? > > Debian's archive information is largely stored within a database; things > like the Packages and Contents files are generated each archive run from > this database, rather than incrementally updating a file. It is easy to > generate a Buildinfo.xz file from information contained within the > database (I have some proof-of-concept code locally that does the > beginnings of this), but generating a tar file like you are describing > is either a case of storing each .buildinfo in the database and > generating the tar each run, or adding and deleting files to an existing > tarball. It seems overly intensive and doesn't really seem to scale. > >> Regarding the OpenPGP signatures, they are vital - but I also see no >> need to strip them in your model. From the point-of-view of the FTP >> archive, there is no immediate need to read or understand the contents >> of the buildinfo file. [*] It's just a dumb data blob, it shouldn't >> matter to Debian whether it's clearsigned or not. > > What I was trying to do with my proposal was turn it from being a dumb > data blob which wasn't easily mapping to the Debian infrastructure, to > something where almost all the information (everything except the actual > signature from the original builder) could be provided alongside the > binaries themselves, enabling people to have what they required to > confirm they could reproduce the builds themselves. *I* think this is > incredibly useful, even if it doesn't achieve everything possible with > reproducible-builds, and I also think that it would provide a sound > basis for another Debian service (perhaps under debian.net to start > with) where multiple builders (starting with the original builder) would > be able to upload their claims, based directly off the buildinfo > information from the archive network. Yes, that's probably an extra > step for the original builder, but it also (to me) seems to be more > flexible and a stronger statement as multiple independent builders can > all confirm things in a single place. > > It sounds like this isn't compatible with where reproducible-builds is > heading though, so apologies for the noise. > I don't mean to suggest a database is not useful. I thought I was talking to ftp-masters through you, so I wanted to be very clear about the security properties we're aiming for, and get common understanding about that first. But I'm not sure why you say it's incompatible - could you not also store the detached signatures within the database, and generate the original file (including signature) from this and the other information? The signatures are much smaller than the rest of the file. In fact, we do indeed have longer-term plans for Debian infrastructure to look into this data and not turn it into a data blob - for example, buildds themselves could try to reproduce a given buildinfo uploaded by a DD, and send alerts about packages that can't be reproduced. (I hinted at this by the "more advanced" behaviours I mentioned in my previous email.) But I wanted to start off with a simple yet strongly-secure model first. What I described is not supposed to contradict the ability for users to "confirm they could reproduce the builds themselves". As I mentioned, a majority use-case is to allow others to download "all the buildinfo files for a given binary package", then they check this locally. Perhaps the confusion is in the suggestion of a single Buildinfo.tgz. Let me disclaim this for now - I wasn't present for the discussions around why all of this information needs to be in one file, it actually does *not* make sense to me. An obvious alternative is to cat all the buildinfo files for a given source package, into one $source-$version.buildinfos.gz file and store this in pool/. This would also make it easy to lookup buildinfo files for a given binary later. Could someone tell me why this approach isn't suitable? Now going back to "users confirming rebuilds": The reason why I started off with this high-security dumb-data-blob approach is to make the security arguments and reasoning very simple and obvious, so it's harder to accidentally weaken or subvert it in the future. Debian isn't even involved in the security logic - it's purely the end-user verifier program. Another benefit of signatures, is that it gives you more information, in the cases where you might not want to build it yourself (e.g. very large programs). If you strip this information, then only Debian is "attesting" to a particular hash (which it didn't even build). If you keep this information, then you can aggregate multiple peoples' attempts to build a given binary. Eventually we could have buildinfo-only uploads, just like we have binary-only or source-only uploads. Then for important binaries l
Re: [Reproducible-builds] Moving towards buildinfo on the archive network
Jonathan McDowell: > On Sat, Aug 20, 2016 at 03:13:00PM +, Ximin Luo wrote: >> Note that the builder is a *distinct entity* from the distribution. >> It's important to keep the *original* signature by B on C. It breaks >> our security logic, to strip the signature and re-sign C using (e.g.) >> the Debian archive release keys - because the entity in charge of this >> release key is not the one that actually performed the build. Doing >> this, would allow malicious builders to re-attribute their misdeeds to >> look like it's the fault of Debian. > > Debian already does this in the context of the fact that Package files > etc are signed by the archive key. It's possible to go and grab the .dsc > file to see who did the file build, but day-to-day no one is using these > to verify the binaries they receive. I care more that Debian stands > behind the packages I download than being able to verify individually > who build each of the packages I'm running - there's no meaningful way I > can attribute trust to *all* of the people who packaged something I have > installed. > You have this backwards. "Being able to verify individually who build each of the packages I'm running" is *exactly* what is required to *not* have to "attribute trust of *all* of the people who packaged something I have installed." and that is one major (probably the main) goal of R-B. Now that I point this out - do you agree, and does it change your mind on anything you previously said? X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Moving towards buildinfo on the archive network
On Sat, Aug 20, 2016 at 03:13:00PM +, Ximin Luo wrote: > Jonathan McDowell: > > Having been impressed by the current status of reproducible builds > > and the fact it looks like we're close to having the important > > pieces in Debian proper, I have started to have a look at how I > > could help out with this bug. I've done some poking around in the > > dak code, and think I have a vague idea of how to achieve what I > > think is wanted. > > > > First, it is helpful to describe what I think is wanted. What I > > think we need is the archive network to have, alongside the binary > > packages it contains, details of exactly how to build those > > binaries. This is, I believe, the information contained in the > > .buildinfo files. > > > In our newest discussions, this purpose is secondary. The primary > purpose of buildinfo files is to record what *one particular builder > actually did in order to produce some output*. Or, equivalently: > > | A buildinfo file, abstractly, is a *claim* C by some builder entity B that > | "I executed process P with env/input I to produce output results R". > > This latter form is slightly easier to reason about, in terms of > security properties. We securely bind the claim C (the contents of the > buildinfo file) to the entity B using a cryptographic signature. I think the problem here is it's not clear (on either side) who "we" or "our" means. Different people want different things from reproducible builds, or have different opinions about relative priorities. As a *minimum* I think distributions should be providing the information of how a particular binary was produced. I suppose what it sort of maps to is "I executed process P with env/input I to produce output results R" (though, of course, distros already provide R; that's the binaries shipped). You've used all the letters I might want to refer to it by, so let's call it Z. The claim, C, is a signature over Z by B. It's useful extra information, but it's not required for me to ensure that the source I have build the binaries I have. > Note that the builder is a *distinct entity* from the distribution. > It's important to keep the *original* signature by B on C. It breaks > our security logic, to strip the signature and re-sign C using (e.g.) > the Debian archive release keys - because the entity in charge of this > release key is not the one that actually performed the build. Doing > this, would allow malicious builders to re-attribute their misdeeds to > look like it's the fault of Debian. Debian already does this in the context of the fact that Package files etc are signed by the archive key. It's possible to go and grab the .dsc file to see who did the file build, but day-to-day no one is using these to verify the binaries they receive. I care more that Debian stands behind the packages I download than being able to verify individually who build each of the packages I'm running - there's no meaningful way I can attribute trust to *all* of the people who packaged something I have installed. > Now back to the "secondary" purpose: > > Using these information "B claims C", other reproduction programs > (that we're also developing) can attempt to actually reproduce the > binaries described. It would do this, by (1) reading the buildinfo > file (2) recreating _some_ of the environment stored in C, and (3) > executing the process, and see if it gives R. You don't need the signature to validate the reproducibility. > The "_some_" in clause (2) is currently up-for-debate, but the > important thing is that this can be changed in the future *without > affecting already-produced buildinfo files*. It may even well be the > case that in the future we'd want to support different values for > "_some_" for a given reproduction tool. > > The main point is that, this is not a concern of the producer nor > distributor of the buildinfo files. I.e.: you guys (the FTP team) only > have to care about making these signed-claims available to be > downloaded by users, and it is up to the users to run a tool that > "interprets" these claims for purposes such as actually attempting > reproduction of a binary. To clarify: I am not a member of the FTP team and do not claim to represent them. I am a DD who was present at the DebConf talk about reproducible builds, was impressed by how far it's come, and asked how I could help get what was missing and still required into Debian. > In this way, we achieve full end-to-end security properties > (verifiability of build) between the producers (builders) and > consumers (users). Distributors only need to care about availiability, > they take no part in the security (except for the case where they are > also a builder, as noted already). I think I take a less strict view on this, which may be where some of the disconnect comes from. I care that Debian stands behind it's builds. I'd like the builder claims to be available (and my original mail did talk about the fact I didn't think I wa
Re: [Reproducible-builds] Moving towards buildinfo on the archive network
Hey, Lunar has stopped doing reproducible builds as a regular thing, and I'm taking over his previous responsibilities. I was also the main other person in formulating the ideas behind the "next iteration" of buildinfo, that dkg described in message #10 earlier in this thread, with Message-ID <87vb8f58rg@alice.fifthhorseman.net>. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=763822#10 Jonathan McDowell: > Having been impressed by the current status of reproducible builds and > the fact it looks like we're close to having the important pieces in > Debian proper, I have started to have a look at how I could help out > with this bug. I've done some poking around in the dak code, and think I > have a vague idea of how to achieve what I think is wanted. > > First, it is helpful to describe what I think is wanted. What I think we > need is the archive network to have, alongside the binary packages it > contains, details of exactly how to build those binaries. This is, I > believe, the information contained in the .buildinfo files. > In our newest discussions, this purpose is secondary. The primary purpose of buildinfo files is to record what *one particular builder actually did in order to produce some output*. Or, equivalently: | A buildinfo file, abstractly, is a *claim* C by some builder entity B that | "I executed process P with env/input I to produce output results R". This latter form is slightly easier to reason about, in terms of security properties. We securely bind the claim C (the contents of the buildinfo file) to the entity B using a cryptographic signature. Note that the builder is a *distinct entity* from the distribution. It's important to keep the *original* signature by B on C. It breaks our security logic, to strip the signature and re-sign C using (e.g.) the Debian archive release keys - because the entity in charge of this release key is not the one that actually performed the build. Doing this, would allow malicious builders to re-attribute their misdeeds to look like it's the fault of Debian. (Of course there is the special case where the builder *is* Debian, but even in this case it's good practise to have separate keys for every buildd, plus a separate release signing key. We can discuss these details separately though.) Anyway, that's our "next iteration" definition of buildinfo files, along with a simplified discussion of the rationale. I wrote down more elsewhere, but I'll keep this short for now, to avoid overwhelming readers. Now back to the "secondary" purpose: Using these information "B claims C", other reproduction programs (that we're also developing) can attempt to actually reproduce the binaries described. It would do this, by (1) reading the buildinfo file (2) recreating _some_ of the environment stored in C, and (3) executing the process, and see if it gives R. The "_some_" in clause (2) is currently up-for-debate, but the important thing is that this can be changed in the future *without affecting already-produced buildinfo files*. It may even well be the case that in the future we'd want to support different values for "_some_" for a given reproduction tool. The main point is that, this is not a concern of the producer nor distributor of the buildinfo files. I.e.: you guys (the FTP team) only have to care about making these signed-claims available to be downloaded by users, and it is up to the users to run a tool that "interprets" these claims for purposes such as actually attempting reproduction of a binary. In this way, we achieve full end-to-end security properties (verifiability of build) between the producers (builders) and consumers (users). Distributors only need to care about availiability, they take no part in the security (except for the case where they are also a builder, as noted already). > This bug has previously talked about a tarball of .buildinfo files, > presented as Buildinfos.tgz alongside the Packages file. From looking at > the current architecture of dak I do not believe that this is an easy > option. > > I propose instead a Buildinfo.xz (or gz or whatever) file, which is > single text file with containing all of the buildinfo information that > corresponds to the Packages list. What is lost by this approach are the > OpenPGP signatures that .buildinfo files can have on them. I appreciate > this is an important part of the reproducible builds aim, but I believe > one of its strengths is the ability for multiple separate package builds > to attest that they have used that buildinfo information to build the > exact same set of binary artefacts. This is not something that easily > scales on the archive network and I think it is better served by a > separate service; it would be possible to take the package snippet from > the buildinfo file and sign that alone, uploading the signature to the > attestation service. For "normal" Debian operation the usual archive > signatures would provide a basic level of attestation of chain
Re: [Reproducible-builds] Moving towards buildinfo on the archive network
Hi Jonathan, Quoting Jonathan McDowell (2016-07-25 22:29:39) > Having been impressed by the current status of reproducible builds and > the fact it looks like we're close to having the important pieces in > Debian proper, I have started to have a look at how I could help out > with this bug. I've done some poking around in the dak code, and think I > have a vague idea of how to achieve what I think is wanted. Having tried hacking dak myself, I want to especially thank you for looking into that! > (Additionally it is not clear to me where the dpkg status for buildinfo > creation is; I have heard that it's close to happening, but I can't find > anything on recent list archives about it - pointers appreciated!) You are probably aware of #138409? It scrolled out of my IRC history already but I think guillem said in #debian-dpkg that releasing a dpkg version with buildinfo support was blocked by coordination with dak because he wants to make sure that dpkg support aligns with what dak ends up supporting. Thanks! cheers, josch signature.asc Description: signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds
Re: [Reproducible-builds] Moving towards buildinfo on the archive network
On 2016-07-25, Jonathan McDowell wrote: > I propose instead a Buildinfo.xz (or gz or whatever) file, which is > single text file with containing all of the buildinfo information that > corresponds to the Packages list. What is lost by this approach are the > OpenPGP signatures that .buildinfo files can have on them. I appreciate > this is an important part of the reproducible builds aim, but I believe > one of its strengths is the ability for multiple separate package builds > to attest that they have used that buildinfo information to build the > exact same set of binary artefacts. This is not something that easily > scales on the archive network and I think it is better served by a > separate service; it would be possible to take the package snippet from > the buildinfo file and sign that alone, uploading the signature to the > attestation service. For "normal" Debian operation the usual archive > signatures would provide a basic level of attestation of chain of build > information. > > The rest of this mail continues on the above assumptions. If you do not > agree with the above the below is probably null and void, so ignore it > and instead educate me about what the requirements are and I'll try and > adjust my ideas based on that. > > So. If a single Buildinfo.xz file is acceptable, with the attestation > being elsewhere, I think this is doable without too much hackery in dak. > There are some trade-offs to make though, and I need to check which are > acceptable and which are viewed as too much. I just wanted to give a huge thanks for taking a good look at this, even if it isn't exactly what has been specced out by earlier reproducible-builds discussions. Evaluating a somewhat different approach, especially if it turns out to be more feasible (at least from some angles), is really valuable in my eyes. FWIW, I wasnt involved in the discussions spelling out what the reproducible builds projects wanted in the archive, so I don't have much concrete to say, but you've clearly given some serious thought and effort to this, so I didn't want it to slip through the cracks! I tried to read through some of the documentation I could find: https://wiki.debian.org/ReproducibleBuilds/BuildinfoSpecification https://reproducible-builds.org/events/athens2015/debian-buildinfo-review/ https://reproducible-builds.org/events/athens2015/buildinfo-content/ Having reviewed the above, there doesn't seem to be a huge conflict that you haven't at least considered already. Hopefully, someone with more history and context with the .buildinfo file discussions can chime in soonish... live well, vagrant signature.asc Description: PGP signature ___ Reproducible-builds mailing list Reproducible-builds@lists.alioth.debian.org http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds