Hi!

On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
> This is a vector I've been somewhat paranoid about myself, and I
> typically check the difference between git archive $TAG and the downloaded
> tar, whenever I package things.  Obviously a backdoor could have been
> inserted into the git repository directly, but there is a culture
> surrounding good hygiene in commits: they ought to be small, focused,
> and well described.

But the backdoor was in fact included in a git commit (it's hidden
inside a test compressed file).

The part that was only present in the tarball was the code to extract
and hook the inclusion of the backdoor via the build system.

> People are comfortable discussing and challenging
> a commit that looks fishy, even if that commit is by the main developer
> of a package.  I have been assuming tooling existed in package
> maintainers' toolkits to verify the faithful reproduction of the
> published git tag in the downloaded source tarball, beyond a signature
> check by the upstream developer.  Apparently, this is not universal.
> 
> Had tooling existed in Debian to automatically validate this faithful
> reproduction, we might not have been exposed to this issue.

Given that the autogenerated stuff is not present in the git tree,
a diff between tarball and git would always generate tons of delta,
so this would not have helped.

> Having done this myself, it has been my experience that many partial
> build artifacts are captured in source tarballs that are not otherwise
> maintained in the git repository.  For instance, in zfs (which I have
> contributed to in the past), many automake files are regenerated.
> (I do not believe that specific package is vulnerable to an attack
> on the autoconf/automake files, since the debian package calls the
> upstream tooling to regenerate those files.)
> 
> We already have a policy of not shipping upstream-built artifacts, so
> I am making a proposal that I believe simply takes that one step further:
> 
> 1. Move towards allowing, and then favoring, git-tags over source tarballs

I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?

> 2. Require upstream-built artifacts be removed (instead, generate these
>    ab-initio during build)

The problem here is that the .m4 file to hook into the build system was
named like one shipped by gnulib (so less suspicious), but xz-utils does
not use gnulib, and thus the autotools machinery does not know anything
about it, so even the «autoreconf -f -i» done by debhelper via
dh-autoreconf, would not regenerate it.

Removing these might be cumbersome after the fact if upstream includes
for example their own maintained .m4 files. See dpkg's m4 dir for an
example of this (although there it's easy as all are namespaced but…).

Not using an upstream provided tarball, might also mean we stop being
able to use upstream signatures, which seems worse. The alternative
might be promoting for upstreams to just do the equivalent of
«git archive», but that might defeat the portability and dependency
reduction properties that were designed into the autotools build
system, or increase the bootstrap set (see for example the
pkg.dpkg.author-release build profile used by dpkg).

(For dpkg at least I'm pondering whether to play with switching to
doing something equivalent to «git archive» though, but see above, or
maybe generate two tarballs, a plain «git archive» and a portable one.)

> 3. Have tooling that automatically checks the sanitized sources against
>    the development RCSs.

Perhaps we could have a declarative way to state all the autogenerated
artifacts included in a tarball that need to be cleaned up
automatically after unpack, in a similar way as how we have a way to
automatically exclude stuff when repackaging tarballs via uscan?

(.gitignore, if upstream properly maintains those might be a good
starting point, but that will tend to include more than necessary.)

> 4. Look unfavorably on upstreams without RCS.

Some upstreams have a VCS, but still do massive code drops, or include
autogenerated stuff in the VCS, or do not do atomic commits, or in
addition their commit message are of the style "fix stuff", "." or
alike. So while this is something we should encourage, it's not
sufficient. I think part of this might already be present in our
Upstream Guidelines in the wiki.

> In the present case, the triggering modification was in a modified .m4 file
> that injected a snippet into the configure script.  That modification
> could have been flagged using this kind of process.

I don't think this modification would have been spotted, because it
was not modifying a file it would usually get autogenerated by its
build system.

> While this would be a lot of work, I believe doing so would require a
> much larger amount of additional complexity in orchestrating attacks
> against Debian in the future.

It would certainly make it a bit harder, but I'm afraid that if you
cannot trust upstream and they are playing a long game, then IMO they
can still sneak nasty stuff even in plain sight with just code commits,
unless you are paying extreme close attention. :/

See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

Thanks,
Guillem

Reply via email to