Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi, Skyler Ferris skribis: > In short, I'm not sure that we actually get any value from checking the > PGP signature for most projects. Either HTTPS is good enough or the > attacker won. 99% of the time HTTPS is good enough (though it is notable > that the remaining 1% has a disproportionate impact on the affected > population). When checking PGP signatures, you end up with a trust-on-first-use model: the first time, you download a PGP key that you know nothing about and you authenticate code against that, which gives no information. On subsequent releases though, you can ensure (ideally) that releases still originates from the same party. HTTPS has nothing to do with that: it just proves that the web server holds a valid certificate for its domain name. But really, the gold standard, if I dare forego any form of modesty, is the ‘.guix-authorizations’ model as it takes care of key distribution as well as authorization delegation and revocation. https://doi.org/10.22152/programming-journal.org/2023/7/1 Ludo’.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
On 4/13/24 05:47, Giovanni Biscuolo wrote: > Hello Skyler, > > Skyler Ferris writes: > >> On 4/12/24 23:50, Giovanni Biscuolo wrote: >>> general reminder: please remember the specific scope of this (sub)thread > [...] > >>> (https://yhetil.org/guix/8734s1mn5p@xelera.eu/) >>> >>> ...and if needed read that message again to understand the context, >>> please. >>> >> I assume that this was an indirect response to the email I sent >> previously where I discussed the problems with PGP signatures on release >> files. > No, believe me! I'm sorry I gave you this impression. :-) > >> I believe that this was in scope > To be clear: not only I did not mean to say - even indirectly - that you > where out of scope _or_ that you did not understand the context. > > Also, I really did not mean to /appear/ as the "coordinator" of this > (sub)thread and even less to /appear/ as the one who decides what's in > scope and what's OT; obviously everyone is absolutely free to decide > what is in scope and that she or he understood the context . > >> because of the discussion about whether to use VCS checkouts which >> lack signatures or release tarballs which have signatures. > I still have not commented what you discussed just because I lack time, > not interest; if I can I'll do it ASAP™ :-( > > [...] > > Thanks! Gio' > Thanks for clarifying! Misunderstandings happen sometimes. I look forward to hearing your thoughts if you're able to find time to share them! =)
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi all, On 4/11/24 06:49, Andreas Enge wrote: > Am Thu, Apr 11, 2024 at 02:56:24PM +0200 schrieb Ekaitz Zarraga: >> I think it's just better to >> obtain the exact same code that is easy to find > The exact same code as what? Actually I often wonder when looking for > a project and end up with a Github repository how I could distinguish > the "original" from its clones in a VCS. With the signature by the > known (this may also be a wrong assumption, admittedly) maintainer > there is at least some form of assurance of origin. I think this assumption deserves a lot more scrutiny than it typically gets (this is a general statement not particular to your message; even the tails project gets this part of security wrong and they are generally diligent in their efforts). I find it difficult to download PGP keys with any degree of confidence. Often, I see a file with a signature and a key served by the same web page, all coming from the same server. PGP keys are only useful if the attacker compromised the information that the user is receiving from the web page (for example, by gaining control of the web server or compromising the HTTPS session). In the typical scenario I have encountered, the attacker would also replace the key and signature with ones that they generated themself. In short, I'm not sure that we actually get any value from checking the PGP signature for most projects. Either HTTPS is good enough or the attacker won. 99% of the time HTTPS is good enough (though it is notable that the remaining 1% has a disproportionate impact on the affected population). Some caveats: It's difficult for me to use web of trust effectively because I haven't met anyone who uses PGP keys IRL. I'm ultimately trusting my internet connection and servers which are either semi-centralized (there are not that many open keyservers, it's an oligopoly for lack of a better term) or have the problem described above. So maybe everyone else is using web of trust effectively and I don't know what I'm talking about. =) The key download could be compared to the "trust on first use" model that SSH uses. It's not clear to me how effective a simple text box saying "we rotated our keys so you need to re-download it!" would be, but I suspect that most people would download without a second thought. It might be interesting to add public keys and signature locations to package definitions and have Guix re-verify the signature when it downloads the source. This would provide more scrutiny when keys are rotated (because of the review process) and would prevent harm from the situation where the package author is re-downloading the key each time the software is updated. The review process also adds a significant layer of protection because an attacker would need to compromise the HTTPS session of the reviewer in addition to the original package author (assuming that the signature is re-checked by the reviewer; I'm not sure how often this happens in practice). In principle it should be difficult for an attacker to predict who will be reviewing which issue. However, if the pool of reviewers is small it would be easier for the attacker to predict this or just compromise all of the reviewers. Also, if there was some way for the attacker to launch a general attack on people working out of the Guix repository then the value of this protection becomes negligible. The above two paragraphs are somewhat at odds: if Guix has the public key baked in and knows where to download the signature, some reviewers might not double-check the key that they get from the website because Guix is doing it for them. On one hand, I generally think that automating security makes it worse because once it's automated there's a system of rules for attackers to manipulate. On the other hand, if we assume people aren't doing the things they need to then no amount of technical support will give us a secure system. How much is reasonable to expect of people? From my extremely biased perspective, it's difficult to say. >> and everybody is reading. > This is a steep claim! I agree that nobody reads generated files in > a release tarball, but I am not sure how many other files are actually > read. > > Andreas I would guess that the level of the protection is strongly correlated with the popularity of the project among developers who need to add features or fix bugs. I don't think anybody reads a source repository "cover to cover", but we rummage around in the code on an as-needed basis. It would probably be difficult to sneak something into core projects like glibc or gcc, but pretty easy to sneak something into "emojis-but-cooler.js". It would be better to have comprehensive audits of all the projects, but that's not something Guix can manage by itself. It could make it easier to free up resources for that task, but I digress. While it is hyperbolic to say that "with enough eyes, all bugs are shallow" there is a
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi again, On 4/12/24 23:50, Giovanni Biscuolo wrote: > Hello, > > general reminder: please remember the specific scope of this (sub)thread > > --8<---cut here---start->8--- > > Please consider that this (sub)thread is _not_ specific to xz-utils but > to the specific attack vector (matrix?) used to inject a backdoor in a > binary during a build phase, in a _very_ stealthy way. > > Also, since Guix _is_ downstream, I'd like this (sub)thread to > concentrate on what *Guix* can/should do to strenghten the build process > /independently/ of what upstreams (or other distributions) can/should > do. > > --8<---cut here---end--->8--- > (https://yhetil.org/guix/8734s1mn5p@xelera.eu/) > > ...and if needed read that message again to understand the context, > please. > > I assume that this was an indirect response to the email I sent previously where I discussed the problems with PGP signatures on release files. I believe that this was in scope because of the discussion about whether to use VCS checkouts which lack signatures or release tarballs which have signatures. If the signatures on the release tarballs are not providing us with additional confidence then we are not losing anything by switching to the VCS checkout. Analysis of the effectiveness of what upstream projects are doing is relevant when trying to determine what we are capable of doing. I also pointed out that a change to Guix such as adding signature metadata to packages could help make up for problems with upstream workflows and how the review process provides additional confidence, demonstrating how this analysis is relevant to what to currently/could possibly do. Please let me know if you think that this is incorrect. Additionally, I need to correct something that I previously said. I stated this: On 4/12/24 17:14, Skyler Ferris wrote: > even the tails project gets this part of security wrong and they are > generally diligent in their efforts Without first double-checking the current state of the project. While this was true at one point, they have since updated their website and clearly explain the problem and what their new verification method is able to protect against at https://tails.net/contribute/design/download_verification/. I apologize for disseminating outdated information.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hello Skyler, Skyler Ferris writes: > On 4/12/24 23:50, Giovanni Biscuolo wrote: >> general reminder: please remember the specific scope of this (sub)thread [...] >> (https://yhetil.org/guix/8734s1mn5p@xelera.eu/) >> >> ...and if needed read that message again to understand the context, >> please. >> > I assume that this was an indirect response to the email I sent > previously where I discussed the problems with PGP signatures on release > files. No, believe me! I'm sorry I gave you this impression. :-) > I believe that this was in scope To be clear: not only I did not mean to say - even indirectly - that you where out of scope _or_ that you did not understand the context. Also, I really did not mean to /appear/ as the "coordinator" of this (sub)thread and even less to /appear/ as the one who decides what's in scope and what's OT; obviously everyone is absolutely free to decide what is in scope and that she or he understood the context . > because of the discussion about whether to use VCS checkouts which > lack signatures or release tarballs which have signatures. I still have not commented what you discussed just because I lack time, not interest; if I can I'll do it ASAP™ :-( [...] Thanks! Gio' -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi Attila, sorry for the delay in my reply, I'm asking myself if this (sub)thread should be "condensed" in a dedicated RFC (are RFCs official workflows in Guix, now?); if so, I volunteer to file such an RFC in the next weeks. Attila Lendvai writes: >> Are there other issues (different from the "host cannot execute target >> binary") that makes relesase tarballs indispensable for some upstream >> projects? > > > i didn't mean to say that tarballs are indispensible. i just wanted to > point out that it's not as simple as going through each package > definition and robotically changing the source origin from tarball to > git repo. it costs some effort, but i don't mean to suggest that it's > not worth doing. OK understood thanks! [...] > i think a good first step would be to reword the packaging guidelines > in the doc to strongly prefer VCS sources instead of tarballs. I agree. >> Even if We™ (ehrm) find a solution to the source tarball reproducibility >> problem (potentially allowing us to patch all the upstream makefiles >> with specific phases in our packages definitions) are we really going to >> start our own (or one managed by the reproducible build community) >> "reproducible source tarballs" repository? Is this feaseable? > > but why would that be any better than simply building from git? which, > i think, would even take less effort. I agree, I was just brainstorming. [...] Thanks, Gio' -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hello, general reminder: please remember the specific scope of this (sub)thread --8<---cut here---start->8--- Please consider that this (sub)thread is _not_ specific to xz-utils but to the specific attack vector (matrix?) used to inject a backdoor in a binary during a build phase, in a _very_ stealthy way. Also, since Guix _is_ downstream, I'd like this (sub)thread to concentrate on what *Guix* can/should do to strenghten the build process /independently/ of what upstreams (or other distributions) can/should do. --8<---cut here---end--->8--- (https://yhetil.org/guix/8734s1mn5p@xelera.eu/) ...and if needed read that message again to understand the context, please. Andreas Enge writes: > Am Thu, Apr 11, 2024 at 02:56:24PM +0200 schrieb Ekaitz Zarraga: >> I think it's just better to >> obtain the exact same code that is easy to find > > The exact same code as what? Of what is contained in the official tool used by upstream to track their code, that is the one and _only_ that is /pragmatically/ open to scrutiny by other upstream and _downstream_ contributors. > Actually I often wonder when looking for a project and end up with a > Github repository how I could distinguish the "original" from its > clones in a VCS. Actually it's a little bit of "intelligence work" but it's something that usually downstream should really do: have a reasonable level of trust that the origin is really the upstream one. But here we are /brainstormig/ about the very issue that led to the backdoor injection, and that issue is how to avoid "backdoor injections via build subversion exploiting semi-binary seeds in release tarballs". (see the scope above) > With the signature by the known (this may also be a wrong assumption, > admittedly) maintainer there is at least some form of assurance of > origin. We should definitely drop the idea of "trust by autority" as a sufficient requisite for verifiability, that is one assumption for reproducible builds. The XZ backdoor injection absolutely demonstrates that one and just one _co-maintainer_ was able to hide a trojan in the _signed_ release tarball and the payload in the git archive (as very obfuscated bynary), so it was _the origin_ that was "infected". It's NOT important _who_ injected the backdoor (and in _was_ upstream), but _how_. In other words, we need a _pragmatic_ way (possibly with helping tools) to "challenge the upstream authority" :-) >> and everybody is reading. > > This is a steep claim! I agree that nobody reads generated files in > a release tarball, but I am not sure how many other files are actually > read. Let's say that at least /someone/ should be _able_ to read the files, but in the attack we are considering /no one/ is _pragmatically_ able to read the (auto)generated semi-binary seeds in the release tarballs. Security is a complex system, especially when considering the entire supply chain: let's focus on this _specific_ weakness of the supply chain. :-) Ciao! Gio' -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hello, Ludovic Courtès writes: > Ekaitz Zarraga skribis: > >> On 2024-04-04 21:48, Attila Lendvai wrote: >>> all in all, just by following my gut insctincts, i was advodating >>> for building everything from git even before the exposure of this >>> backdoor. in fact, i found it surprising as a guix newbie that not >>> everything is built from git (or their VCS of choice). >> >> That has happened to me too. >> Why not use Git directly always? > > Because it create{s,d} a bootstrapping issue. The > “builtin:git-download” method was added only recently to guix-daemon and > cannot be assumed to be available yet: > > https://issues.guix.gnu.org/65866 This fortunately will help a lot with the "everything built from git" part of the "whishlist", but what about the not zero occurrences of "other upstream VCSs"? [...] > I think we should gradually move to building everything from > source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. > > This has been suggested several times before. The difficulty, as you > point out, will lie in addressing bootstrapping issues with core > packages: glibc, GCC, Binutils, Coreutils, etc. I’m not sure how to do > that but… does it have to be an "all of nothing" choiche? I mean "continue using release tarballs" vs "use git" for "all"? If using git is unfeaseable for bootstrapping reasons [1], why not cointinue using release tarballs with some _extra_ verifications steps and possibly add some automation steps to "lint" to help contributors and committers check that there are not "quasi-binary" seeds [2] hidden in release tarballs? WDYT? [...] Grazie! Gio' [1] or other reasons specific to a package that should be documented when needed, at least with a comment in the package definition [2] the autogenerated files that are not pragmatically verifiable -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi! Andreas Enge skribis: > Am Wed, Apr 10, 2024 at 03:57:20PM +0200 schrieb Ludovic Courtès: >> I think we should gradually move to building everything from >> source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. > > the big drawback of this approach is that we would lose maintainers' > signatures, right? Yes. But as Attila wrote, one can hope that they provide a way to authenticate at least part of their VCS history, for example with signed tags. (Ideally everyone would use ‘guix git authenticate’ of course.) > Would the suggestion to use signed tarballs, but to autoreconf the > generated files, not be a better compromise between trusting and > distrusting upstream maintainers? IMO starting from an authenticated VCS checkout is clearer. Ludo’.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
> > I think we should gradually move to building everything from > > source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. > > > the big drawback of this approach is that we would lose maintainers' > signatures, right? it's possible to sign git commits and (annotated) tags, too. it's good practice to enable signing by default. admittedly though, few people sign all their commits, and even fewer sign their tags. -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “Never appeal to a man's "better nature". He may not have one. Invoking his self-interest gives you more leverage.” — Robert Heinlein (1907–1988), 'Time Enough For Love' (1973)
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi, and everybody is reading. This is a steep claim! I agree that nobody reads generated files in a release tarball, but I am not sure how many other files are actually read. Yea, it is. I'd also love to know how effective is the reading in a release tarball vs a VCS repo. Quality of the reading is also very important. I simply don't even try to read a tarball, not having the history makes the understanding very difficult. If I find a piece of code that seems odd, I would like to `git blame` it and see what was the reason for the inclusion, who included it and so on. It's not much, but it's better than nothing. Although, I'd understand if you told me the history might be misleading, too.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Am Thu, Apr 11, 2024 at 02:56:24PM +0200 schrieb Ekaitz Zarraga: > I think it's just better to > obtain the exact same code that is easy to find The exact same code as what? Actually I often wonder when looking for a project and end up with a Github repository how I could distinguish the "original" from its clones in a VCS. With the signature by the known (this may also be a wrong assumption, admittedly) maintainer there is at least some form of assurance of origin. > and everybody is reading. This is a steep claim! I agree that nobody reads generated files in a release tarball, but I am not sure how many other files are actually read. Andreas
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi, On 2024-04-11 14:43, Andreas Enge wrote: Hello, Am Wed, Apr 10, 2024 at 03:57:20PM +0200 schrieb Ludovic Courtès: I think we should gradually move to building everything from source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. the big drawback of this approach is that we would lose maintainers' signatures, right? Would the suggestion to use signed tarballs, but to autoreconf the generated files, not be a better compromise between trusting and distrusting upstream maintainers? Andreas Probably not, because the release tarballs might code that is not present in the Git history and there are not that many eyes checking them. This time it was autoconf, but it might be anything else. The maintainers' machines can be hijacked too... I think it's just better to obtain the exact same code that is easy to find and everybody is reading.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hello, Am Wed, Apr 10, 2024 at 03:57:20PM +0200 schrieb Ludovic Courtès: > I think we should gradually move to building everything from > source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. the big drawback of this approach is that we would lose maintainers' signatures, right? Would the suggestion to use signed tarballs, but to autoreconf the generated files, not be a better compromise between trusting and distrusting upstream maintainers? Andreas
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi, Ekaitz Zarraga skribis: > On 2024-04-04 21:48, Attila Lendvai wrote: >> all in all, just by following my gut insctincts, i was advodating >> for building everything from git even before the exposure of this >> backdoor. in fact, i found it surprising as a guix newbie that not >> everything is built from git (or their VCS of choice). > > That has happened to me too. > Why not use Git directly always? Because it create{s,d} a bootstrapping issue. The “builtin:git-download” method was added only recently to guix-daemon and cannot be assumed to be available yet: https://issues.guix.gnu.org/65866 > In the bootstrapping it's also a problem, as all those tools > (autotools) must be bootstrapped, and they require other programs > (compilers) that actually use them. And we'll be forced to use git, > too, or at least clone the bootstrapping repos, git-archive them > ourselves and host them properly signed. At least, we could challenge > them using git (similar to what we do with the substitutes), which we > cannot do right now with the release tarballs against the actual code > of the repository. I think we should gradually move to building everything from source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. This has been suggested several times before. The difficulty, as you point out, will lie in addressing bootstrapping issues with core packages: glibc, GCC, Binutils, Coreutils, etc. I’m not sure how to do that but… > In live-bootstrap they just write the build scripts by hand, and > ignore whatever the ./configure script says. That's also a reasonable > way to tackle the bootstrapping, but it's a hard one. Thankfully, we > are working together in this Bootstrapping effort so we can learn from > them and adapt their recipes to our Guix commencement.scm module. This > would be some effort, but it's actually doable. … live-bootstrap can probably be a good source of inspiration to find a way to build those core packages (or some of them) straight from a VCS checkout. And here the trick will be to find a way to do that in a concise and maintainable way (generating config.h and Makefiles by hand may prove unmaintainable in practice.) Ludo’.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
On Thu, 04 Apr 2024 12:34:42 +0200 Giovanni Biscuolo wrote: > Hello everybody, > > I know for sure that Guix maintainers and developers are working on > this, I'm just asking to find some time to inform and possibly discuss > with users (also in guix-devel) on what measures GNU Guix - the > software distribution - can/should deploy to try to avoid this kind > of attacks. What about integrating ClamAV into the build farms (if this isn't a thing already)? ClamAV could scan source files and freshly-built packages and perhaps detect obvious malware. AFAIK it can also detect CVEs. Guix already has ClamAV packaged so this shouldn't be that hard. -- Jan Wielkiewicz
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
> Are there other issues (different from the "host cannot execute target > binary") that makes relesase tarballs indispensable for some upstream > projects? i didn't mean to say that tarballs are indispensible. i just wanted to point out that it's not as simple as going through each package definition and robotically changing the source origin from tarball to git repo. it costs some effort, but i don't mean to suggest that it's not worth doing. > So, while "almost all the world" is applying wrong solutions to the > source tarball reproducibility problem, what can Guix do? AFAIU the plan is straightforward: change all package definitions to point to the (git) repos of the upstream, and ignore any generated ./configure scripts if it happens to be checked into the repo. it involves quite some work, both in quantity, and also some thinking around surprises. i think a good first step would be to reword the packaging guidelines in the doc to strongly prefer VCS sources instead of tarballs. > Even if We™ (ehrm) find a solution to the source tarball reproducibility > problem (potentially allowing us to patch all the upstream makefiles > with specific phases in our packages definitions) are we really going to > start our own (or one managed by the reproducible build community) > "reproducible source tarballs" repository? Is this feaseable? but why would that be any better than simply building from git? which, i think, would even take less effort. > > but these generated man files are part of the release tarball, so > > cross compilation works fine using the tarball. > > > AFAIU in this case there is an easy alternative: distribute the > (generated) man files as code tracked in the DVCS (e.g. git) repo > itself. yes, that would work in this case (although, that man page is guaranteed to go stale). my proposal was to simply drop the generated man file. it adds very little value (although it's not zero; web search, etc). -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “It is easy to be conspicuously 'compassionate' if others are being forced to pay the cost.” — Murray N. Rothbard (1926–1995)
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi Attila and guix-security team, Attila Lendvai writes: >> Are really "configure scripts containing hundreds of thousands of lines >> of code not present in the upstream VCS" the norm? > > pretty much for all C and C++ projects that use autoconf... which is > numerous, especially among the core GNU components. OK, thank you for the confirmation. [...] >> ...or is it better to completely avoid release tarballs as our sources >> uris? > > yes, and this^ would guarantee the previous point, but it's not always > trivial. > > as an example see this: https://issues.guix.gnu.org/61750 [...] > it breaks crosscompilation, because the host cannot execute the target > binary. OK thanks, I missed that. In general, there is really no other solution for projects than to distribute some artifacts "out of band" or renounce to crosscompile?!? Are there other issues (different from the "host cannot execute target binary") that makes relesase tarballs indispensable for some upstream projects? AFAIU the only thing that /could/ "save" source tarballs it's their /scientific/ reproducibility. In this direction there is a very interesting patchset from Janneke Nieuwenhuizen to try to get a reproducible _Guix_ release tarball: https://issues.guix.gnu.org/70169 «Reproducible `make dist' tarball in defiance of Autotools and Gettext» Obviously having a reproducible tarball makes _practical_ the "pragmatically impossible" task to reproduce a release tarball to check if it corresponds to the same **build** (make dist) performed in the official DVCS repo; only this could "save" all the "build software using release tarball" workflow. ...but /in general/ here we are _downstream_, we have absolutely no control over upstream, and it's _very_ unlikely that we'll see a *good* solution to the tarball reproduciblity problem applied "in the wild upstream" soon. I said "a **good* solution" because some proposals I'm reading about are /bad/ _complications_ that absolutely are NOT really solving the source tarball reproduciblity problem [1]; for example: 1. build the tarball on the RM host using a docker container (unreproducible built) and call it "a reproducible release tarball": https://medium.com/@lanoxx/creating-reproducible-release-tarballs-fa2e2ce745a7 2. have a CI system based on github actions [2] and call it "fully verifiable": https://externals.io/message/122811#122814 (from php.internals mailing list) So, while "almost all the world" is applying _wrong_ solutions to the source tarball reproducibility problem, what can Guix do? Even if We™ (ehrm) find a solution to the source tarball reproducibility problem (potentially allowing us to patch all the upstream makefiles with specific phases in our packages definitions) are we really going to start our own (or one managed by the reproducible build community) "reproducible source tarballs" repository? Is this feaseable? I think there is no solution that can "pragmatically save" the source tarballs of all the software packaged in Guix (and all other distributions part of the reproducible builds effort). > but these generated man files are part of the release tarball, so > cross compilation works fine using the tarball. AFAIU *in this case* there is an easy alternative: distribute the (generated) man files as *code* tracked in the DVCS (e.g. git) repo itself. IMHO it's likely that this workflow can fix most if not all the crosscompilation issues, no? In general, AFAIU it's against reproducibility to distribute pre-generated (compiled? transpiled?) artifacts in a tarball that are not present in the official DVCS repo, especially when tarballs are _not_ reproducible (and they are not in likely 99.9% of cases). > all in all, just by following my gut insctincts, i was advodating for > building everything from git even before the exposure of this > backdoor. in fact, i found it surprising as a guix newbie that not > everything is built from git (or their VCS of choice). Given the current situation so clearly exposed by the "xz backdoor" case, this is something Guix should seriously consider. I mean: Guix should seriously consider to drop source tarballs and _also_ all pre-compiled artifacts distributed only via that tarballs. I don't like this proposal, but I see no other "pragmatically possible" solution. AFAIU no need to rush, but I'm afraid that the class of attacks we can call "supply-chain backdoor injection due to source tarball pragmatically impossible verifiability" are hard to deploy but unfortunately not _too_ hard. [...] Thanks! Gio' [1] this boils down to the unfortunate fact that "reproducibility" is a very misunderstood concept [1.1], even by some very skilled (experienced?) programmers [1.1] because it's strictly related to good _redistribution_ of _trusted_ software, not to good programming [2] https://docs.github.com/en/actions/learn-github-actions/understanding-github-actions#runners «each workflow run executes in a fresh, newly-provisioned
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
[mu4e must have changed the key bindings for replies, so here is my mail again, this time as a wide reply.] Giovanni Biscuolo writes: > So AFAIU using a fixed "autoreconf -fi" should mitigate the risks of > tampered .m4 macros (and other possibly tampered build configuration > script)? > > IMHO "ignoring" (deleting) pre-built build scripts in Guix > build-system(s) should be considered... or is /already/ so? The gnu-build-system has a bootstrap phase, but it only does something when a configure script does not already exist. We sometimes force it to bootstrap the build system when we patch configure.ac. In previous discussions there were no big objections to always bootstrapping the build system files from autoconf/automake sources. This particular backdoor relied on a number of obfuscations: - binary test data. Nobody ever looks at binaries. - incomprehensibility of autotools output. This one is fundamentally a social problem and easily extends to other complex build systems. In the xz case, the instructions for assembling the shell snippets to inject the backdoor could hide in plain sight, just because configure scripts are expected to be near incomprehensible. They contain no comments, are filled to the brim with portable (lowest common denominator) shell magic, and contain bizarrely named variables. Not using generated output is a good idea anyway and removes the requirement to trust that the release tarballs are faithful derivations from the autotools sources, but given the bland complexity of build system code (whether that's recursive Makefiles, CMake cruft, or the infamous gorilla spit[1] of autotools) I don't see a good way out. [1] https://www.gnu.org/software/autoconf/manual/autoconf-2.65/autoconf.html#History > Given the above observation that < to peer review a tarball prepared in this manner>>, I strongly doubt that > a possible Makefile tampering _in_the_release_tarball_ is easy to peer > review; I'd ask: is it feaseable such an "automated analysis" (see > above) in a dedicated build-system phase? I don't think it's feasible. Since Guix isn't a regular user (the target audience of configure scripts) it has no business depending on generated configure scripts. It should build these from source. > In other words: what if the backdoor was injected directly in the source > code of the *official* release tarball signed with a valid GPG signature > (and obviously with a valid sha256 hash)? A malicious maintainer can sign bad release tarballs. A malicious contributor can push signed commits that contain backdoors in code. > Do upstream developer communities peer review release tarballs or they > "just" peer review the code in the official DVCS? Most do neither. I'd guess that virtually *nobody* reviews tarballs beyond automated tests (like what the GNU maintainers' GNUmakefile / maint.mk does when preparing a release). > Also, in (info "(guix) origin Reference") I see that Guix packages can have a > list of uri(s) for the origin of source code, see xz as an example [7]: > are they intended to be multiple independent sources to be compared in > order to prevent possible tampering or are they "just" alternatives to > be used if the first listed uri is unavailable? They are alternative URLs, much like what the mirror:// URLs do. > If the case is the first, a solution would be to specify multiple > independent release tarballs for each package, so that it would be > harder to copromise two release sources, but that is not something under > Guix control. We have hashes for this purpose. A tarball that was modified since the package definition has been published would have a different hash. This is not a statement about tampering, but only says that our expectations (from the time of packaging) have not been met. > All in all: should we really avoid the "pragmatically impossible to be > peer reviewed" release tarballs? Yes. -- Ricardo
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi, I just want to add some perspective from the bootstrapping. On 2024-04-04 21:48, Attila Lendvai wrote: all in all, just by following my gut insctincts, i was advodating for building everything from git even before the exposure of this backdoor. in fact, i found it surprising as a guix newbie that not everything is built from git (or their VCS of choice). That has happened to me too. Why not use Git directly always? In the bootstrapping it's also a problem, as all those tools (autotools) must be bootstrapped, and they require other programs (compilers) that actually use them. And we'll be forced to use git, too, or at least clone the bootstrapping repos, git-archive them ourselves and host them properly signed. At least, we could challenge them using git (similar to what we do with the substitutes), which we cannot do right now with the release tarballs against the actual code of the repository. In live-bootstrap they just write the build scripts by hand, and ignore whatever the ./configure script says. That's also a reasonable way to tackle the bootstrapping, but it's a hard one. Thankfully, we are working together in this Bootstrapping effort so we can learn from them and adapt their recipes to our Guix commencement.scm module. This would be some effort, but it's actually doable. Hope this adds something useful to the discussion, Ekaitz
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
> Are really "configure scripts containing hundreds of thousands of lines > of code not present in the upstream VCS" the norm? pretty much for all C and C++ projects that use autoconf... which is numerous, especially among the core GNU components. > If so, can we consider hundreds of thousand of lines of configure > scripts and other (auto)generated files bundled in release tarballs > "pragmatically impossible" to be peer reviewed? yes. > Can we consider that artifacts as sort-of-binary and "force" our > build-systems to regenerate all them? that would be a good practice. > ...or is it better to completely avoid release tarballs as our sources > uris? yes, and this^ would guarantee the previous point, but it's not always trivial. as an example see this: https://issues.guix.gnu.org/61750 in short: when building shepherd from git the man files need to be generated using the program help2man. this invokes the binary with --help and formats the output as a man page. the usefulness of this is questionable, but the point is that it breaks crosscompilation, because the host cannot execute the target binary. but these generated man files are part of the release tarball, so cross compilation works fine using the tarball. all in all, just by following my gut insctincts, i was advodating for building everything from git even before the exposure of this backdoor. in fact, i found it surprising as a guix newbie that not everything is built from git (or their VCS of choice). -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “For if you [the rulers] suffer your people to be ill-educated, and their manners to be corrupted from their infancy, and then punish them for those crimes to which their first education disposed them, what else is to be concluded from this, but that you first make thieves [and outlaws] and then punish them.” — Sir Thomas More (1478–1535), 'Utopia', Book 1
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hi Attila, Attila Lendvai writes: >> Also, in (info "(guix) origin Reference") I see that Guix packages >> can have a list of uri(s) for the origin of source code, see xz as an >> example [7]: are they intended to be multiple independent sources to >> be compared in order to prevent possible tampering or are they "just" >> alternatives to be used if the first listed uri is unavailable? > > a source origin is identified by its cryptographic hash (stored in its > sha256 field); i.e. it doesn't matter *where* the source archive was > acquired from. if the hash matches the one in the package definition, > then it's the same archive that the guix packager has seen while > packaging. Ehrm, you are right, mine was a stupid question :-) We *are* already verifying that tarballs had not been tampered with... by other people but the release manager :-( [...] Happy hacking! Gio' -- Giovanni Biscuolo Xelera IT Infrastructures signature.asc Description: PGP signature
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hello, a couple of additional (IMO) useful resources... Giovanni Biscuolo writes: [...] > Let me highlight this: «It is pragmatically impossible [...] to peer > review a tarball prepared in this manner.» > > There is no doubt that the release tarball is a very weak "trusted > source" (trusted by peer review, not by authority) than the upstream > DVCS repository. This kind of attack was described by Daniel Stenberg in his «HOWTO backdoor curl» article in 2021.03.30 as "skip-git-altogether" method: https://daniel.haxx.se/blog/2021/03/30/howto-backdoor-curl/ --8<---cut here---start->8--- The skip-git-altogether methods As I’ve described above, it is really hard even for a skilled developer to write a backdoor and have that landed in the curl git repository and stick there for longer than just a very brief period. If the attacker instead can just sneak the code directly into a release archive then it won’t appear in git, it won’t get tested and it won’t get easily noticed by team members! curl release tarballs are made by me, locally on my machine. After I’ve built the tarballs I sign them with my GPG key and upload them to the curl.se origin server for the world to download. (Web users don’t actually hit my server when downloading curl. The user visible web site and downloads are hosted by Fastly servers.) An attacker that would infect my release scripts (which btw are also in the git repository) or do something to my machine could get something into the tarball and then have me sign it and then create the “perfect backdoor” that isn’t detectable in git and requires someone to diff the release with git in order to detect – which usually isn’t done by anyone that I know of. [...] I of course do my best to maintain proper login sanitation, updated operating systems and use of safe passwords and encrypted communications everywhere. But I’m also a human so I’m bound to do occasional mistakes. Another way could be for the attacker to breach the origin download server and replace one of the tarballs there with an infected version, and hope that people skip verifying the signature when they download it or otherwise notice that the tarball has been modified. I do my best at maintaining server security to keep that risk to a minimum. Most people download the latest release, and then it’s enough if a subset checks the signature for the attack to get revealed sooner rather than later. --8<---cut here---end--->8--- Unfortunately Stenberg in that section misses one attack vector he mentioned in a previous article section named "The tricking a user method": --8<---cut here---start->8--- We can even include more forced “convincing” such as direct threats against persons or their families: “push this code or else…”. This way of course cannot be protected against using 2fa, better passwords or things like that. --8<---cut here---end--->8--- ...and an attack vector involving more subltle ways (let's call it distributed social engineering) to convince the upstream developer and other contributors and/or third parties they need a project co-maintainer authorized to publish _official_ release tarballs. Following Stenberg's attacks classification, since the supply-chain attack was intended to install a backdoor in the _sshd_ service, and _not_ in xz-utils or liblzma, we can classify this attack as: skip-git-altogether to install a backdoor further-down-the-chain, precisely in a _dependency_ of the attacked one, durind a period of "weakness" of the upstream maintainers Stenberg closes his article with this update and one related reply to a comment: --8<---cut here---start->8--- Dependencies Added after the initial post. Lots of people have mentioned that curl can get built with many dependencies and maybe one of those would be an easier or better target. Maybe they are, but they are products of their own individual projects and an attack on those projects/products would not be an attack on curl or backdoor in curl by my way of looking at it. In the curl project we ship the source code for curl and libcurl and the users, the ones that builds the binaries from that source code will get the dependencies too. [...] Jean Hominal says: April 1, 2021 at 14:04 I think the big difference why you “missed” dependencies as an attack vector is because today, most application developers ship their dependencies in their application binaries (by linking statically or shipping a container) – in such a case, I would definitely count an attack on such a dependency, that is then shipped as part of the project’s artifacts, as a successful attack on the project. However, as you only ship a source artifact – of course, dependencies *are* out of scope in your case. Daniel Stenberg says: April 1, 2021 at 15:05 Jean: Right.
Re: backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
> Also, in (info "(guix) origin Reference") I see that Guix packages can have a > list of uri(s) for the origin of source code, see xz as an example [7]: > are they intended to be multiple independent sources to be compared in > order to prevent possible tampering or are they "just" alternatives to > be used if the first listed uri is unavailable? a source origin is identified by its cryptographic hash (stored in its sha256 field); i.e. it doesn't matter *where* the source archive was acquired from. if the hash matches the one in the package definition, then it's the same archive that the guix packager has seen while packaging. -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “We’ll know our disinformation program is complete when everything the American public believes is false.” — William Casey (1913–1987), the director of CIA 1981-1987
backdoor injection via release tarballs combined with binary artifacts (was Re: Backdoor in upstream xz-utils)
Hello everybody, I know for sure that Guix maintainers and developers are working on this, I'm just asking to find some time to inform and possibly discuss with users (also in guix-devel) on what measures GNU Guix - the software distribution - can/should deploy to try to avoid this kind of attacks. Please consider that this (sub)thread is _not_ specific to xz-utils but to the specific attack vector (matrix?) used to inject a backdoor in a binary during a build phase, in a _very_ stealthy way. Also, since Guix _is_ downstream, I'd like this (sub)thread to concentrate on what *Guix* can/should do to strenghten the build process /independently/ of what upstreams (or other distributions) can/should do. First of all, I understand the xz backdoor attack was complex (both socially and technically) and all the details are still under scrutiny, but AFAIU the way the backdoor has been injected by "infecting" the **build phase** of the software (and obfuscating the payload in binaries) is very alarming and is something all distributions aiming at reproducible builds must (and they actually _are_) examine(ing) very well. John Kehayias writes: [...] > On Fri, Mar 29, 2024 at 01:39 PM, Felix Lechner via Reports of security > issues in Guix itself and in packages provided by Guix wrote: > >> Hi Ryan, >> >> On Fri, Mar 29 2024, Ryan Prior wrote: [...] >>> Guix currently packages xz-utils 5.2.8 as "xz" using the upstream >>> tarball. [...] Should we switch from using upstream tarballs to some >>> fork with more responsible maintainers? >> >> Guix's habit of building from tarballs is a poor idea because tarballs >> often differ. First of all: is to be considered reproducible a software that produces different binaries if compiled from the source code repository (git or something else managed) or from the official released source tarball? My first thought is no. >> For example, maintainers may choose to ship a ./configure script that >> is otherwise not present in Git (although a configure.ac might be). >> Guix should build from Git. Two useful pointers explaining how the backdoor has been injected are [1] (general workflow) and [2] (payload obfuscation) The first and *indispensable* condition for the attack to be succesful is this: --8<---cut here---start->8--- * The release tarballs upstream publishes don't have the same code that GitHub has. This is common in C projects so that downstream consumers don't need to remember how to run autotools and autoconf. The version of build-to-host.m4 in the release tarballs differs wildly from the upstream on GitHub. [...] * Explain dist tarballs, why we use them, what they do, link to autotools docs, etc * "Explaining the history of it would be very helpful I think. It also explains how a single person was able to insert code in an open source project that no one was able to peer review. It is pragmatically impossible, even if technically possible once you know the problem is there, to peer review a tarball prepared in this manner." --8<---cut here---end--->8--- (from [1]) Let me highlight this: «It is pragmatically impossible [...] to peer review a tarball prepared in this manner.» There is no doubt that the release tarball is a very weak "trusted source" (trusted by peer review, not by authority) than the upstream DVCS repository. It's *very* noteworthy that the backdoor was discovered thanks to a performance issue and _not_ during a peer review of the source code... the _build_ code *is* source code, no? It's not the first time a source release tarball of free software is compromised [3], but the way the compromise worked in this case is something new (or at least never spetted before, right?). > We discussed a bit on #guix today about this. A movement to sourcing > more directly from Git in general has been discussed before, though > has some hurdles. Please could someone knowledgeable about the details describe what are the hurdles about sourcing from DVCS (eventually other than git)? > I will let someone more knowledgeable about the details chime in, but > yes, something we should do. I'm definitely _not_ the knowledgeable one, but I'd like to share the result of my researches. Is it possible to enhance our build-system(s) (e.g. gnu-build-system) so thay can /ignore/ pre-built .m4 or similar script and rebuild them during the build process? Richard W.M. Jones on fedora-devel ML proposed [4]: --8<---cut here---start->8--- (1) We should routinely delete autoconf-generated cruft from upstream projects and regenerate it in %prep. It is easier to study the real source rather than dig through the convoluted, generated shell script in an upstream './configure' looking for back doors. For most projects, just running "autoreconf - fiv" is enough. --8<---cut here---end--->8--- There is an interesting