backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Giovanni Biscuolo
Hello everybody,

I know for sure that Guix maintainers and developers are working on
this, I'm just asking to find some time to inform and possibly discuss
with users (also in guix-devel) on what measures GNU Guix - the software
distribution - can/should deploy to try to avoid this kind of attacks.

Please consider that this (sub)thread is _not_ specific to xz-utils but
to the specific attack vector (matrix?) used to inject a backdoor in a
binary during a build phase, in a _very_ stealthy way.

Also, since Guix _is_ downstream, I'd like this (sub)thread to
concentrate on what *Guix* can/should do to strenghten the build process
/independently/ of what upstreams (or other distributions) can/should
do.

First of all, I understand the xz backdoor attack was complex (both
socially and technically) and all the details are still under scrutiny,
but AFAIU the way the backdoor has been injected by "infecting" the
**build phase** of the software (and obfuscating the payload in
binaries) is very alarming and is something all distributions aiming at
reproducible builds must (and they actually _are_) examine(ing) very
well.

John Kehayias  writes:

[...]

> On Fri, Mar 29, 2024 at 01:39 PM, Felix Lechner via Reports of security 
> issues in Guix itself and in packages provided by Guix wrote:
>
>> Hi Ryan,
>>
>> On Fri, Mar 29 2024, Ryan Prior wrote:

[...]

>>> Guix currently packages xz-utils 5.2.8 as "xz" using the upstream
>>> tarball. [...] Should we switch from using upstream tarballs to some
>>> fork with more responsible maintainers?
>>
>> Guix's habit of building from tarballs is a poor idea because tarballs
>> often differ.

First of all: is to be considered reproducible a software that produces
different binaries if compiled from the source code repository (git or
something else managed) or from the official released source tarball?

My first thought is no.

>> For example, maintainers may choose to ship a ./configure script that
>> is otherwise not present in Git (although a configure.ac might be).
>> Guix should build from Git.

Two useful pointers explaining how the backdoor has been injected are
[1] (general workflow) and [2] (payload obfuscation)

The first and *indispensable* condition for the attack to be succesful
is this:

--8<---cut here---start->8---

* The release tarballs upstream publishes don't have the same code that
 GitHub has. This is common in C projects so that downstream consumers
 don't need to remember how to run autotools and autoconf. The version
 of build-to-host.m4 in the release tarballs differs wildly from the
 upstream on GitHub.

[...]

* Explain dist tarballs, why we use them, what they do, link to
  autotools docs, etc

 * "Explaining the history of it would be very helpful I think. It also
 explains how a single person was able to insert code in an open source
 project that no one was able to peer review. It is pragmatically
 impossible, even if technically possible once you know the problem is
 there, to peer review a tarball prepared in this manner."

--8<---cut here---end--->8---
(from [1])

Let me highlight this: «It is pragmatically impossible [...] to peer
review a tarball prepared in this manner.»

There is no doubt that the release tarball is a very weak "trusted
source" (trusted by peer review, not by authority) than the upstream
DVCS repository.

It's *very* noteworthy that the backdoor was discovered thanks to a
performance issue and _not_ during a peer review of the source
code... the _build_ code *is* source code, no?

It's not the first time a source release tarball of free software is
compromised [3], but the way the compromise worked in this case is
something new (or at least never spetted before, right?).

> We discussed a bit on #guix today about this. A movement to sourcing
> more directly from Git in general has been discussed before, though
> has some hurdles.

Please could someone knowledgeable about the details describe what are
the hurdles about sourcing from DVCS (eventually other than git)?

> I will let someone more knowledgeable about the details chime in, but
> yes, something we should do.

I'm definitely _not_ the knowledgeable one, but I'd like to share the
result of my researches.

Is it possible to enhance our build-system(s) (e.g. gnu-build-system) so
thay can /ignore/ pre-built .m4 or similar script and rebuild them
during the build process?

Richard W.M. Jones on fedora-devel ML proposed [4]:

--8<---cut here---start->8---

(1) We should routinely delete autoconf-generated cruft from upstream
projects and regenerate it in %prep. It is easier to study the real
source rather than dig through the convoluted, generated shell script in
an upstream './configure' looking for back doors. For most projects,
just running "autoreconf - fiv" is enough.

--8<---cut here---end--->8---

There is an interesting 

Re: Coordinators for patch review session on Tuesday

2024-04-04 Thread Steve George
Hi,

Comments below:

On 3 Apr, Christina O'Donnell wrote:
(...)
> Thank you for writing this up in so much depth! I've reviewed [1] and tried
> to tag it as reviewed-looks-good, though I don't think that has gone
> through. If you or someone else could take a look at it then I'd appreciate
> that. I plan on reviewing some more patches this evening.
> 
> Kind regards,
> Christina
> 
> [1] https://debbugs.gnu.org/cgi-bin/bugreport.cgi?users=guix;bug=65938
>

1. Changing the tag to reviewed-looks-good

It doesn't look like this worked. The way to do this is in the instructions are 
4. 'Set a user tag' [0], probably the easiest way is to send an email (I do get 
funny results sometimes with my email client):

Subject: setting usertag on 65938

user guix
usertag 65938 + reviewed-looks-good
quit

The first line is important it has to be 'user guix' for it to appear on the 
patch review reports [1]. I think I messed up the instructions in the Wiki - 
you have to have a + in between the bug number and the tag you want to set 
(sorry about that). Please try again.

This is really just a way of signalling that reviews are happening - so trying 
to keep us in sync. The usertags we're using are:

- patch-review-hackers-list
- under-review
- escalated-review-request 
- waiting-on-contributor
- reviewed-looks-good

The patch changes all look reasonable to me, you've already done a lot:

1. You should add a reviewed-by trailer:
Reviews are contributions from our community (and work!) so we should recognise 
them and add trailers. It also helps the maintainer know who did the review and 
therefore the level of confidence.

Basically just add 'Reviewed-by: A Person  - [2]

It looks like your updated patch retriggered QA, so if you look here and the 
foolow the Data Service link on the right you can see it's building it:

  https://qa.guix.gnu.org/issue/65938

The last step will be for a maintainer to see that it's built correctly, see 
your review and to apply it - great job for a first patch review!

Steve / Futurile

[0] 
https://libreplanet.org/wiki/Group:Guix/PatchReviewSessions2024#Patch_review_process_-_CLI_tools
[1] 
https://libreplanet.org/wiki/Group:Guix/PatchReviewSessions2024#Patch_review_states_and_reports
[2] 
https://libreplanet.org/wiki/Group:Guix/PatchReviewSessions2024#10._Add_a_Reviewed-by_Trailer



Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Attila Lendvai
> Also, in (info "(guix) origin Reference") I see that Guix packages can have a
> list of uri(s) for the origin of source code, see xz as an example [7]:
> are they intended to be multiple independent sources to be compared in
> order to prevent possible tampering or are they "just" alternatives to
> be used if the first listed uri is unavailable?


a source origin is identified by its cryptographic hash (stored in its sha256 
field); i.e. it doesn't matter *where* the source archive was acquired from. if 
the hash matches the one in the package definition, then it's the same archive 
that the guix packager has seen while packaging.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“We’ll know our disinformation program is complete when everything the American 
public believes is false.”
— William Casey (1913–1987), the director of CIA 1981-1987




Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Giovanni Biscuolo
Hello,

a couple of additional (IMO) useful resources...

Giovanni Biscuolo  writes:

[...]

> Let me highlight this: «It is pragmatically impossible [...] to peer
> review a tarball prepared in this manner.»
>
> There is no doubt that the release tarball is a very weak "trusted
> source" (trusted by peer review, not by authority) than the upstream
> DVCS repository.

This kind of attack was described by Daniel Stenberg in his «HOWTO
backdoor curl» article in 2021.03.30 as "skip-git-altogether" method:

https://daniel.haxx.se/blog/2021/03/30/howto-backdoor-curl/
--8<---cut here---start->8---

The skip-git-altogether methods

As I’ve described above, it is really hard even for a skilled developer
to write a backdoor and have that landed in the curl git repository and
stick there for longer than just a very brief period.

If the attacker instead can just sneak the code directly into a release
archive then it won’t appear in git, it won’t get tested and it won’t
get easily noticed by team members!

curl release tarballs are made by me, locally on my machine. After I’ve
built the tarballs I sign them with my GPG key and upload them to the
curl.se origin server for the world to download. (Web users don’t
actually hit my server when downloading curl. The user visible web site
and downloads are hosted by Fastly servers.)

An attacker that would infect my release scripts (which btw are also in
the git repository) or do something to my machine could get something
into the tarball and then have me sign it and then create the “perfect
backdoor” that isn’t detectable in git and requires someone to diff the
release with git in order to detect – which usually isn’t done by anyone
that I know of.

[...] I of course do my best to maintain proper login sanitation,
updated operating systems and use of safe passwords and encrypted
communications everywhere. But I’m also a human so I’m bound to do
occasional mistakes.

Another way could be for the attacker to breach the origin download
server and replace one of the tarballs there with an infected version,
and hope that people skip verifying the signature when they download it
or otherwise notice that the tarball has been modified. I do my best at
maintaining server security to keep that risk to a minimum. Most people
download the latest release, and then it’s enough if a subset checks the
signature for the attack to get revealed sooner rather than later.

--8<---cut here---end--->8---

Unfortunately Stenberg in that section misses one attack vector he
mentioned in a previous article section named "The tricking a user
method":

--8<---cut here---start->8---

We can even include more forced “convincing” such as direct threats
against persons or their families: “push this code or else…”. This way
of course cannot be protected against using 2fa, better passwords or
things like that.

--8<---cut here---end--->8---

...and an attack vector involving more subltle ways (let's call it
distributed social engineering) to convince the upstream developer and
other contributors and/or third parties they need a project
co-maintainer authorized to publish _official_ release tarballs.

Following Stenberg's attacks classification, since the supply-chain
attack was intended to install a backdoor in the _sshd_ service, and
_not_ in xz-utils or liblzma, we can classify this attack as:

  skip-git-altogether to install a backdoor further-down-the-chain,
  precisely in a _dependency_ of the attacked one, durind a period of
  "weakness" of the upstream maintainers

Stenberg closes his article with this update and one related reply to a
comment:

--8<---cut here---start->8---

Dependencies

Added after the initial post. Lots of people have mentioned that curl
can get built with many dependencies and maybe one of those would be an
easier or better target. Maybe they are, but they are products of their
own individual projects and an attack on those projects/products would
not be an attack on curl or backdoor in curl by my way of looking at it.

In the curl project we ship the source code for curl and libcurl and the
users, the ones that builds the binaries from that source code will get
the dependencies too.

[...]

 Jean Hominal says: 
 April 1, 2021 at 14:04 

 I think the big difference why you “missed” dependencies as an attack
 vector is because today, most application developers ship their
 dependencies in their application binaries (by linking statically or
 shipping a container) – in such a case, I would definitely count an
 attack on such a dependency, that is then shipped as part of the
 project’s artifacts, as a successful attack on the project.

 However, as you only ship a source artifact – of course, dependencies
 *are* out of scope in your case.

 Daniel Stenberg says: 
 April 1, 2021 at 15:05 

 Jean: Right. I

Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Giovanni Biscuolo
Hi Attila,

Attila Lendvai  writes:

>> Also, in (info "(guix) origin Reference") I see that Guix packages
>> can have a list of uri(s) for the origin of source code, see xz as an
>> example [7]: are they intended to be multiple independent sources to
>> be compared in order to prevent possible tampering or are they "just"
>> alternatives to be used if the first listed uri is unavailable?
>
> a source origin is identified by its cryptographic hash (stored in its
> sha256 field); i.e. it doesn't matter *where* the source archive was
> acquired from. if the hash matches the one in the package definition,
> then it's the same archive that the guix packager has seen while
> packaging.

Ehrm, you are right, mine was a stupid question :-)

We *are* already verifying that tarballs had not been tampered
with... by other people but the release manager :-(

[...]

Happy hacking! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures


signature.asc
Description: PGP signature


Re: Coordinators for patch review session on Tuesday

2024-04-04 Thread Christina O'Donnell

Hi,

Thanks for your reply,


1. Changing the tag to reviewed-looks-good

It doesn't look like this worked. The way to do this is in the instructions are 
4. 'Set a user tag' [0], probably the easiest way is to send an email (I do get 
funny results sometimes with my email client):

Subject: setting usertag on 65938

user guix
usertag 65938 + reviewed-looks-good
quit

The first line is important it has to be 'user guix' for it to appear on the 
patch review reports [1]. I think I messed up the instructions in the Wiki - 
you have to have a + in between the bug number and the tag you want to set 
(sorry about that). Please try again.


Ah I got it this time. I was missing the 'user guix'. I didn't read the 
wiki and tried to look it up from the debbugs documentation.



This is really just a way of signalling that reviews are happening - so trying 
to keep us in sync. The usertags we're using are:

- patch-review-hackers-list
- under-review
- escalated-review-request
- waiting-on-contributor
- reviewed-looks-good
If I change the patch quite a lot, should I mark it as 
'escalated-review-request' instead of 'reviewed-looks-good'?


And should I remove them from the patch-review-hackers-list after I've 
responded

The patch changes all look reasonable to me, you've already done a lot:

Great, thanks! Good to know I'm doing things vaguely right!

1. You should add a reviewed-by trailer:
Reviews are contributions from our community (and work!) so we should recognise 
them and add trailers. It also helps the maintainer know who did the review and 
therefore the level of confidence.

Basically just add 'Reviewed-by: A Person  - [2]

Sure, do you want me resubmit these patches to add that?

It looks like your updated patch retriggered QA, so if you look here and the 
foolow the Data Service link on the right you can see it's building it:

   https://qa.guix.gnu.org/issue/65938

The last step will be for a maintainer to see that it's built correctly, see 
your review and to apply it - great job for a first patch review!

Wonderful! The first of many, I'm hoping.

Kind regards,
Christina



Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Attila Lendvai
> Are really "configure scripts containing hundreds of thousands of lines
> of code not present in the upstream VCS" the norm?


pretty much for all C and C++ projects that use autoconf... which is numerous, 
especially among the core GNU components.


> If so, can we consider hundreds of thousand of lines of configure
> scripts and other (auto)generated files bundled in release tarballs
> "pragmatically impossible" to be peer reviewed?


yes.


> Can we consider that artifacts as sort-of-binary and "force" our
> build-systems to regenerate all them?


that would be a good practice.


> ...or is it better to completely avoid release tarballs as our sources
> uris?


yes, and this^ would guarantee the previous point, but it's not always trivial.

as an example see this: https://issues.guix.gnu.org/61750

in short: when building shepherd from git the man files need to be generated 
using the program help2man. this invokes the binary with --help and formats the 
output as a man page. the usefulness of this is questionable, but the point is 
that it breaks crosscompilation, because the host cannot execute the target 
binary.

but these generated man files are part of the release tarball, so cross 
compilation works fine using the tarball.

all in all, just by following my gut insctincts, i was advodating for building 
everything from git even before the exposure of this backdoor. in fact, i found 
it surprising as a guix newbie that not everything is built from git (or their 
VCS of choice).

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“For if you [the rulers] suffer your people to be ill-educated, and their 
manners to be corrupted from their infancy, and then punish them for those 
crimes to which their first education disposed them, what else is to be 
concluded from this, but that you first make thieves [and outlaws] and then 
punish them.”
— Sir Thomas More (1478–1535), 'Utopia', Book 1




A paper about Plan 9 and Guix

2024-04-04 Thread Edouard Klein
Dear Guix developers,

A paper of mine has been accepted as a Work in Progress at the next
International Workshop on Plan 9 (http://iwp9.org/).

I'll be presenting it not next week end, but the one after (12-14 April
2024).

I'd be happy if some of you would be so kind as to read it with their
extensive knowledge of Guix, in case I've made a mistake somewhere.

https://the-dam.org/docs/explanations/Plan9ListenOnLinux.html

Thanks in advance,

Cheers,

Edouard.



Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Ekaitz Zarraga

Hi,

I just want to add some perspective from the bootstrapping.

On 2024-04-04 21:48, Attila Lendvai wrote:


all in all, just by following my gut insctincts, i was advodating for building 
everything from git even before the exposure of this backdoor. in fact, i found 
it surprising as a guix newbie that not everything is built from git (or their 
VCS of choice).


That has happened to me too.
Why not use Git directly always?

In the bootstrapping it's also a problem, as all those tools (autotools) 
must be bootstrapped, and they require other programs (compilers) that 
actually use them. And we'll be forced to use git, too, or at least 
clone the bootstrapping repos, git-archive them ourselves and host them 
properly signed. At least, we could challenge them using git (similar to 
what we do with the substitutes), which we cannot do right now with the 
release tarballs against the actual code of the repository.


In live-bootstrap they just write the build scripts by hand, and ignore 
whatever the ./configure script says. That's also a reasonable way to 
tackle the bootstrapping, but it's a hard one. Thankfully, we are 
working together in this Bootstrapping effort so we can learn from them 
and adapt their recipes to our Guix commencement.scm module. This would 
be some effort, but it's actually doable.


Hope this adds something useful to the discussion,

Ekaitz




Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)

2024-04-04 Thread Ricardo Wurmus
[mu4e must have changed the key bindings for replies, so here is my mail
again, this time as a wide reply.]

Giovanni Biscuolo  writes:

> So AFAIU using a fixed "autoreconf -fi" should mitigate the risks of
> tampered .m4 macros (and other possibly tampered build configuration
> script)?
>
> IMHO "ignoring" (deleting) pre-built build scripts in Guix
> build-system(s) should be considered... or is /already/ so?

The gnu-build-system has a bootstrap phase, but it only does something
when a configure script does not already exist.  We sometimes force it
to bootstrap the build system when we patch configure.ac.

In previous discussions there were no big objections to always
bootstrapping the build system files from autoconf/automake sources.

This particular backdoor relied on a number of obfuscations:

- binary test data.  Nobody ever looks at binaries.

- incomprehensibility of autotools output.  This one is fundamentally a
  social problem and easily extends to other complex build systems.  In
  the xz case, the instructions for assembling the shell snippets to
  inject the backdoor could hide in plain sight, just because configure
  scripts are expected to be near incomprehensible.  They contain no
  comments, are filled to the brim with portable (lowest common
  denominator) shell magic, and contain bizarrely named variables.

Not using generated output is a good idea anyway and removes the
requirement to trust that the release tarballs are faithful derivations
from the autotools sources, but given the bland complexity of build system
code (whether that's recursive Makefiles, CMake cruft, or the infamous
gorilla spit[1] of autotools) I don't see a good way out.

[1] 
https://www.gnu.org/software/autoconf/manual/autoconf-2.65/autoconf.html#History

> Given the above observation that < to peer review a tarball prepared in this manner>>, I strongly doubt that
> a possible Makefile tampering _in_the_release_tarball_ is easy to peer
> review; I'd ask: is it feaseable such an "automated analysis" (see
> above) in a dedicated build-system phase?

I don't think it's feasible.  Since Guix isn't a regular user (the
target audience of configure scripts) it has no business depending on
generated configure scripts.  It should build these from source.

> In other words: what if the backdoor was injected directly in the source
> code of the *official* release tarball signed with a valid GPG signature
> (and obviously with a valid sha256 hash)?

A malicious maintainer can sign bad release tarballs.  A malicious
contributor can push signed commits that contain backdoors in code.

> Do upstream developer communities peer review release tarballs or they
> "just" peer review the code in the official DVCS?

Most do neither.  I'd guess that virtually *nobody* reviews tarballs
beyond automated tests (like what the GNU maintainers' GNUmakefile /
maint.mk does when preparing a release).

> Also, in (info "(guix) origin Reference") I see that Guix packages can have a
> list of uri(s) for the origin of source code, see xz as an example [7]:
> are they intended to be multiple independent sources to be compared in
> order to prevent possible tampering or are they "just" alternatives to
> be used if the first listed uri is unavailable?

They are alternative URLs, much like what the mirror:// URLs do.

> If the case is the first, a solution would be to specify multiple
> independent release tarballs for each package, so that it would be
> harder to copromise two release sources, but that is not something under
> Guix control.

We have hashes for this purpose.  A tarball that was modified since the
package definition has been published would have a different hash.  This
is not a statement about tampering, but only says that our expectations
(from the time of packaging) have not been met.

> All in all: should we really avoid the "pragmatically impossible to be
> peer reviewed" release tarballs?

Yes.

-- 
Ricardo