Hi Lucas,

> > and we didn't really make any progress on improving the
> > git2dcs audit trail..
>
> Actually, I've been working on a service that tries to audit exactly
> that: https://debaudit.debian.net/git2dsc/

In the part you didn't quote I wrote:

> The fact that currently about 8% of the orig
> tarballs in Debian unstable can't be proven to match 1:1 the upstream
> tarballs and is in need of a manual review I think severely undermines
> reproducible builds.

I didn't put a source where this 8% number is from as I didn't know if
you are ready to announce https://debaudit.debian.net/, but I am glad
you did it now. Hope you also send a separate wider announcement email
on this new service. A big thanks to you for working on this for the
past month!

It does still hold true that "we didn't make progress on improving the
git2dcs audit trail" as the git2dcs is analysing the data points that
already existed before this thread started. The 'trail' didn't
improve, it is still the same data points, but for sure publicly
available audit reports did make a major leap!

> It still needs a few rounds of debugging and improvements, but results
> already don't look too bad at this point:
> https://debaudit.debian.net/git2dsc/statistics
>
> I'm not sure that we need Git-Tag-Info (for non-tag2upload packages), as
> it sounds easy enough to just guess what is the tag to follow.

I wrote to you yesterday about some ideas on how to improve the
automatic classification of failures and I am sure we can increase the
rate of how many sources your new system can verify from current ~75%
up much closer to 100%, but I wouldn't describe current state as "easy
enough to just guess what is the tag to follow" when the system
reports that ~7000 packages can't yet be traced. Given enough effort
we can track them all, just saying it isn't "easy enough".

Another aspect to consider is that when an upload has the metadata
about what git commit id / git tree id the uploader claims to have
done it from, the audit trail it forms also acts as an attestation
that the person and the tool intended to upload the same thing as was
in git. Given enough effort we can already now find all discrepancies
between uploads and git contents, but it is hard to prove if something
was done by accident or out of ignorance, or actually maliciously.
Having an additional data point about intent makes it easier to prove
that somebody did something intentionally wrong.


I have written way too many messages to this thread today and will
take a break, but I am happy to continue discussions about debaudit in
the appropriate development forum for it. As said, major leap in
making audit reports and stats easily accessible for anyone - thanks!

Reply via email to