Re: lcamtuf on the recent xz debacle

Christian Weisgerber Thu, 04 Apr 2024 18:10:46 -0700

Katherine Mcmillan:

> Just for clarity, does anyone know what "Unix-like operating systems"
> would be affected by this?


None.  TLDR: The build process of the backdoor explicitly aborts
on platforms other than Linux x86-64.

As the maintainer of the archivers/xz port, I took a look at the
build stages of the malicious code, because I had already prepared
an update to 5.6.1 and run the code in question.

Two ostensible test files were committed to the xz repository
immediately before the 5.6.0 release and updated immediately before
5.6.1: bad-3-corrupt_lzma2.xz, which as the name suggests is a
malformed compressed file, and good-large_compressed.lzma, which
is a valid file and extracts to a mixture of easily compressible
repeated characters and uncompressible pseudo-random data.  By
themselves those files are completely harmless.

As is common practice, the xz repository only contains input files
like configure.ac and Makefile.am for the GNU autotools.  For the
release tarball, an autotools run generates the actual configure
script, Makefile.in, etc., so the result can be built with "./configure
&& make".

For the 5.6.0 and 5.6.1 release, the build-to-host.m4 macro package
that ships as part of GNU gettext was replaced by a modified version
that was copied into the release tarball and, importantly, was used
to generate a modified configure script.  Let's call this stage 0.

When you run the configure script, the stage 0 shell snippet is
executed.  The malicious code runs a pipe of commands that reads
the bad-3-corrupt_lzma2.xz file, swaps some byte values to turn it
into a valid file, extracts the file with xz (which must already
be installed), and feeds the content--let's call it stage 1--into
a shell.

In 5.6.1, the stage 1 script will abort right away if the operating
system doesn't identify as "Linux" with uname(1).  The script runs
another pipe of commands that decompresses good-large_compressed.lzma,
picks some chunks of the result, replaces some byte values to turn
it into a valid LZMA data stream, extracts the content--let's call
it stage 2--and feeds it into a shell.  The data manipulation in
stage 1 uses the head(1) command with the "-c" command flag, which
isn't available on OpenBSD.

In 5.6.1 there is another early attempt in the stage 2 script to
verify that the operating system is Linux, however the syntax is
broken so it doesn't actually do anything.  The stage 2 script runs
quite a number of tests to ensure that the environment in which it
executes is the one it expects: details of the directory tree,
details of the files generated by configure, that the platform is
x86-64 Linux, that the compiler is gcc and the linker GNU ld, that
the IFUNC feature is available, that is runs as part of a .deb or
.rpm package build.  If any single one of those tests fails, the
script aborts right away.  If everything checks out, stage 2 again
runs a series of data manipulation commands to extract from
good-large_compressed.lzma two object files and injects them into
the build to generate a manipulated liblzma.  Various checks that
stage 2 performs will fail on OpenBSD and again it relies on
"head -c" and now also on the GNU version of sed(1) to perform the
required data manipulations.

For the actual code inserted into liblzma on Linux x86-64, I have
to refer to the ongoing reverse engineering performed by the Linux
people.  It is my understanding that its code is triggered by an
IFUNC constructor during dynamic linking that checks that it is in
the address space of a /usr/sbin/sshd process and then proceeds to
redirect an RSA signature verification routine to its own malicious
code.  Liblzma ends up dynamically linked to sshd because of a
systemd-related extension added by many Linux packagers that pulls
in liblzma as an unrelated dependency.  The actual backdoor is
triggered by an SSH connection that authenticates with a certificate
that includes an RSA public key, part of which is a payload that
is checked against a fingerprint, then verified for a correct Ed448
signature with a key only the attacker knows, and then this content
is directly executed in a shell spawned by sshd for remote code
execution.

The build stage of the backdoor is well hidden.  The stage 0 shell
snippet looks at first glance like a plausible part of the poorly
readable autoconf/automake tooling.  The test files that hide the
further stages and actual backdoor code are unsuspicious by themselves.
5.6.1 added further tests to abort early on non-Linux platforms,
presumably so that nobody examining build problems would stumble
over anything suspicious.  I think the check for a .deb or .rpm
build is intended to inject the backdoor only during automated
package building, so people developing or debugging xz would not
accidentally discover it in the build directory.

I can identify four commits in the xz Git repository that are related
to the backdoor.  In chronological order:

2024-02-23 cf44e4b Tests: Add a few test files.
2024-03-09 82ecc53 liblzma: Fix false Valgrind error report with GCC.
2024-03-09 8c9b8b2 liblzma: Fix typos in crc32_fast.c and crc64_fast.c.
2024-03-09 6e63681 Tests: Update two test files.

cf44e4b and 6e63681 directly add and update hidden malicious code.
Aside from its documented change, 82ecc53 introduces an unmotivated
whitespace change...

-       return is_arch_extension_supported()   
+       return  is_arch_extension_supported()

... which is then reverted by 8c9b8b2.  The stage 2 script actually
relies on matching "return is_arch_extension_supported", so 82ecc53
breaks the backdoor injection and 8c9b8b2 restores it.  Maybe a
change intended for testing by the malware author accidentally
slipped in.

Another malicious commit, entirely unrelated to the backdoor, is

2024-02-26 328c52d Build: Fix Linux Landlock feature test in Autotools
and CMake builds.

This introduces a syntax error that breaks Landlock detection when
using CMake instead of the autotools build framework, so the Linux
sandboxing is disabled in this case.  The syntax error is a single
period '.' as the first character on an otherwise empty line of C
code.  That is designed so it will be easily missed.  It does not
plausibly pass for a typo because no typical editing glitch will
leave a '.' character there.

I'm not aware of any clearly malicious commit before 2024-02-23.

I'll conclude this brain dump by pointing out that much of the
emerging narrative about this backdoor that you can read all over
the net is based on idle speculation and selective interpretation
of facts.

-- 
Christian "naddy" Weisgerber                          na...@mips.inka.de

Re: lcamtuf on the recent xz debacle

Reply via email to