Jonathan McDowell: > On Sat, Aug 20, 2016 at 03:13:00PM +0000, Ximin Luo wrote: >> I have trouble imagining what could make Buildinfo.tgz hard, but make >> Buildinfo.xz easy - could you explain this in more detail, please? > > Debian's archive information is largely stored within a database; things > like the Packages and Contents files are generated each archive run from > this database, rather than incrementally updating a file. It is easy to > generate a Buildinfo.xz file from information contained within the > database (I have some proof-of-concept code locally that does the > beginnings of this), but generating a tar file like you are describing > is either a case of storing each .buildinfo in the database and > generating the tar each run, or adding and deleting files to an existing > tarball. It seems overly intensive and doesn't really seem to scale. > >> Regarding the OpenPGP signatures, they are vital - but I also see no >> need to strip them in your model. From the point-of-view of the FTP >> archive, there is no immediate need to read or understand the contents >> of the buildinfo file. [*] It's just a dumb data blob, it shouldn't >> matter to Debian whether it's clearsigned or not. > > What I was trying to do with my proposal was turn it from being a dumb > data blob which wasn't easily mapping to the Debian infrastructure, to > something where almost all the information (everything except the actual > signature from the original builder) could be provided alongside the > binaries themselves, enabling people to have what they required to > confirm they could reproduce the builds themselves. *I* think this is > incredibly useful, even if it doesn't achieve everything possible with > reproducible-builds, and I also think that it would provide a sound > basis for another Debian service (perhaps under debian.net to start > with) where multiple builders (starting with the original builder) would > be able to upload their claims, based directly off the buildinfo > information from the archive network. Yes, that's probably an extra > step for the original builder, but it also (to me) seems to be more > flexible and a stronger statement as multiple independent builders can > all confirm things in a single place. > > It sounds like this isn't compatible with where reproducible-builds is > heading though, so apologies for the noise. >
I don't mean to suggest a database is not useful. I thought I was talking to ftp-masters through you, so I wanted to be very clear about the security properties we're aiming for, and get common understanding about that first. But I'm not sure why you say it's incompatible - could you not also store the detached signatures within the database, and generate the original file (including signature) from this and the other information? The signatures are much smaller than the rest of the file. In fact, we do indeed have longer-term plans for Debian infrastructure to look into this data and not turn it into a data blob - for example, buildds themselves could try to reproduce a given buildinfo uploaded by a DD, and send alerts about packages that can't be reproduced. (I hinted at this by the "more advanced" behaviours I mentioned in my previous email.) But I wanted to start off with a simple yet strongly-secure model first. What I described is not supposed to contradict the ability for users to "confirm they could reproduce the builds themselves". As I mentioned, a majority use-case is to allow others to download "all the buildinfo files for a given binary package", then they check this locally. Perhaps the confusion is in the suggestion of a single Buildinfo.tgz. Let me disclaim this for now - I wasn't present for the discussions around why all of this information needs to be in one file, it actually does *not* make sense to me. An obvious alternative is to cat all the buildinfo files for a given source package, into one $source-$version.buildinfos.gz file and store this in pool/. This would also make it easy to lookup buildinfo files for a given binary later. Could someone tell me why this approach isn't suitable? Now going back to "users confirming rebuilds": The reason why I started off with this high-security dumb-data-blob approach is to make the security arguments and reasoning very simple and obvious, so it's harder to accidentally weaken or subvert it in the future. Debian isn't even involved in the security logic - it's purely the end-user verifier program. Another benefit of signatures, is that it gives you more information, in the cases where you might not want to build it yourself (e.g. very large programs). If you strip this information, then only Debian is "attesting" to a particular hash (which it didn't even build). If you keep this information, then you can aggregate multiple peoples' attempts to build a given binary. Eventually we could have buildinfo-only uploads, just like we have binary-only or source-only uploads. Then for important binaries like gcc, perhaps 20 people will want to upload their .buildinfo files to Debian with their signatures attached, to make us all feel better about that. Note also in general that you don't actually *want* all of the buildinfo fields to be the same for everyone. Only the output *has* to be the same, and it is actually a stronger security property if we get two buildinfo files that started off with *different* inputs (such as buildpath/time/etc) and got the *same* binary hashes out. X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git