dherr...@tentpost.com wrote:
On 2024-03-30 18:25, Bruno Haible wrote:
Eric Gallager wrote:

Hm, so should automake's `distcheck` target be updated to perform
these checks as well, then?

The first mentioned check can not be automated. ...

The second mentioned check could be done by the maintainer, ...


I agree that distcheck is good but not a cure all. Any static system can be attacked when there is motive, and unit tests are easily gamed.

The issue seems to be releases containing binary data for unit tests, instead of source or scripts to generate that data. In this case, that binary data was used to smuggle in heavily obfuscated object code.

The best analysis in one place that I have found so far is <URL:https://gynvael.coldwind.pl/?lang=en&id=782>. In brief, grep is used to locate the main backdoor files by searching for marker strings. After running tests/files/bad-3-corrupt_lzma2.xz through tr(1), it becomes a /valid/ xz file that decompresses to a shell script that extracts a second shell script from part of the compressed data in tests/files/good-large_compressed.lzma and pipes it to a shell. That second script has two major functions: first, it searches the test files for four six-byte markers, and it then extracts and decrypts (using a simple RC4-alike implemented in Awk) the binary backdoor also found in tests/files/good-large_compressed.lzma. The six-byte markers mark beginning and end of raw LZMA2 streams obfuscated with a simple substitution cipher. Any such streams found would be decompressed and read by the shell, but neither of the known crocked releases had any files containing those markers. The binary backdoor is an x86-64 object that gets unpacked into liblzma_la-crc64-fast.o, unless m4/gettext.m4 contains "dnl Convert it to C string syntax." which is a clever flag because about no one actually checks that those m4 files in release tarballs actually match what the GNU project distributes. The object itself is just the backdoor and presumably provides the symbol _get_cpuid as its entrypoint, since the unpacker script patches the src/liblzma/check/crc{64,32}_fast.c files in a pipeline to add calls to that function and drops the compiled objects in .libs/. Running make will then skip building those objects, since they are already up-to-date, and the backdoored objects get linked into the final binary.

Commit 6e636819e8f070330d835fce46289a3ff72a7b89 (<URL:https://git.tukaani.org/?p=xz.git;a=commitdiff;h=6e636819e8f070330d835fce46289a3ff72a7b89>) was an update to the backdoor. The commit message is suspicious, claiming the use of "a constant seed" to generate reproducible test files, but /not/ declaring how the files were produced, which of course prevents reproducibility.

With a reproducible build system, multiple maintainers can "make dist" and compare the output to cross-check for erroneous / malicious dist environments. Multiple signatures should be harder to compromise, assuming each is independent and generally trustworthy.

This can only work if a package /has/ multiple active maintainers.

You also have a small misunderstanding here: "make dist" prepares a (source) release tarball, not a binary build, so this is a closely-related issue but actually distinct from reproducible builds. Also easier to solve, since we only have to make the source tarball reproducible.

Maybe GNU should establish a cross-verification signing standard and "dist verification service" that automates this process? Point it to a repo and tag, request a signed hash of the dist package... Then downstream projects could check package signatures from both the maintainer and such third-party verifiers to check that nothing was inserted outside of version control.

Essentially, this would be an automated release building service: upon request, make a Git checkout, run autogen.sh or equivalent, make dist, and publish or hash the result. The problem is that an attacker who manages to gain commit access to a repository may be able to launch attacks on the release building service, since "make dist" can run scripts. The service could probably mount the working filesystem noexec since preparing source releases should not require running (non-system) binaries and scripts can be run by directly feeding them into their interpreters even if the filesystem is mounted noexec, but this still leaves all available interpreters and system tools potentially available.


-- Jacob

Reply via email to