Eric Blake <ebl...@redhat.com> writes:

> Widening the audience to include bug-gnulib, which is the upstream
> source of "# build-to-host.m4 serial 3" which was bypassed by the
> malicious "# build-to-host.m4 serial 30".
>
> On Sun, Mar 31, 2024 at 11:51:36PM +0200, Guillem Jover wrote:
>> Hi!
>> 
>> While analyzing the recent xz backdoor hook into the build system [A],
>> I noticed that one of the aspects why the hook worked was because it
>> seems like «autoreconf -f -i» (that is run in Debian as part of
>> dh-autoreconf via dh) still seems to take the serial into account,
>> which was bumped in the tampered .m4 file. If either the gettext.m4
>> had gotten downgraded (to the version currently in Debian, which would
>> not have pulled the tampered build-to-host.m4), or once Debian upgrades
>> gettext, the build-to-host.m4 would get downgraded to the upstream
>> clean version, then the hook would have been disabled and the backdoor
>> would be inert. (Of course at that point the malicious actor would
>> have found another way to hook into the build system, but the less
>> avenues there are the better.)
>> 
>> I've tried to search the list and checked for old bug reports on the
>> debbugs.gnu.org site, but didn't notice anything. To me this looks like
>> a very unexpected behavior, but it's not clear whether this is intentional
>> or a bug. In any case regardless of either position, it would be good to
>> improve this (either by fixing --force to force things even if
>> downgrading, or otherwise perhaps to add a new option to really force
>> everything).
>> 
>> [A] <https://lists.debian.org/debian-devel/2024/03/msg00367.html>
>>     Longish mail, search for "try to go in detail" for the analysis.
>
> My understanding is that the use of serial numbers in .m4 snippets was
> intentional in gnulib (more or less where the practice originated),
> but only because gnulib prefers a linear history (everything is
> monotonically increasing, no forks for the serial number to diverge
> on).  In light of this weekend's mess, Bruno may have more ideas about
> how to prevent his files from being turned into backdoor delivery
> mechanisms in the future.

I think the root cause here is assuming 'autoreconf -fi' achieves
anything related to re-bootstrapping.  I think the entire concept of
re-bootstrapping from a source tarball with generated contents in it is
fundamentally flawed.  I have proposed that we should start to release
*-src.tar.gz tarballs that doesn't have any pre-generated in it, that
can be completely bootstrapped using external tools.  See writeup here:

https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-code-tarballs-please-welcome-src-tar-gz/

To me, moving things towards this approach allows incremental work that
eventually will be more reliable than anything that attempts to
re-boostrap from a tarball with some pre-generated artifacts in it
(because there will always be uncertainty if the artifact used was
actually built or came from the tarball).

I suggest that we extend 'make dist' to produce these *-src.tar.gz
tarballs, possibly only when some new automake AM_INIT_AUTOMAKE flag is
used.  There could be some functions to modify how the tarball is
generated, much like we have dist-hooks today that is often used to
generate ChangeLog for the tarballs.  Thoughts?

/Simon

Attachment: signature.asc
Description: PGP signature

Reply via email to