Hi Lucas, > > and we didn't really make any progress on improving the > > git2dcs audit trail.. > > Actually, I've been working on a service that tries to audit exactly > that: https://debaudit.debian.net/git2dsc/
In the part you didn't quote I wrote: > The fact that currently about 8% of the orig > tarballs in Debian unstable can't be proven to match 1:1 the upstream > tarballs and is in need of a manual review I think severely undermines > reproducible builds. I didn't put a source where this 8% number is from as I didn't know if you are ready to announce https://debaudit.debian.net/, but I am glad you did it now. Hope you also send a separate wider announcement email on this new service. A big thanks to you for working on this for the past month! It does still hold true that "we didn't make progress on improving the git2dcs audit trail" as the git2dcs is analysing the data points that already existed before this thread started. The 'trail' didn't improve, it is still the same data points, but for sure publicly available audit reports did make a major leap! > It still needs a few rounds of debugging and improvements, but results > already don't look too bad at this point: > https://debaudit.debian.net/git2dsc/statistics > > I'm not sure that we need Git-Tag-Info (for non-tag2upload packages), as > it sounds easy enough to just guess what is the tag to follow. I wrote to you yesterday about some ideas on how to improve the automatic classification of failures and I am sure we can increase the rate of how many sources your new system can verify from current ~75% up much closer to 100%, but I wouldn't describe current state as "easy enough to just guess what is the tag to follow" when the system reports that ~7000 packages can't yet be traced. Given enough effort we can track them all, just saying it isn't "easy enough". Another aspect to consider is that when an upload has the metadata about what git commit id / git tree id the uploader claims to have done it from, the audit trail it forms also acts as an attestation that the person and the tool intended to upload the same thing as was in git. Given enough effort we can already now find all discrepancies between uploads and git contents, but it is hard to prove if something was done by accident or out of ignorance, or actually maliciously. Having an additional data point about intent makes it easier to prove that somebody did something intentionally wrong. I have written way too many messages to this thread today and will take a break, but I am happy to continue discussions about debaudit in the appropriate development forum for it. As said, major leap in making audit reports and stats easily accessible for anyone - thanks!

