[gentoo-amd64] Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?)

Duncan Thu, 07 Aug 2014 14:19:40 -0700

Mark Knecht posted on Thu, 07 Aug 2014 11:16:23 -0700 as excerpted:

> So that's all looking pretty good, as a first step. If it's a matter of
> 3 1/2 minutes instead of 1-2 minutes then I can live with that part.
> However that's just (I think) the portage tree and not signed source
> code, correct?


[I just posted a reply to the gpg specific stuff.]

Technically correct, but not really so in implementation.  See below...

> Now, is the idea that I have a validated portage snapshot at this point
> and stiff have to actually get the code using the regular emerge which
> will do the checking because I have:
> 
> FEATURES="buildpkg strict webrsync-gpg"

No...  It doesn't work that way.

> I don't see any evidence that emerge checked what it downloaded, but
> maybe those checks are only done when I really build the code?

Here's what happens.

FEATURES=webrsync-gpg simply tells the webrsync stuff to gpg-verify the 
snapshot-tarball that webrsync downloads.  Without that, it'd still 
download it the same, but it wouldn't verify the signature.  This allows 
people who use the webrsync only because they're behind a firewall that 
wouldn't allow normal rsync, but who don't care about the gpg signing 
security stuff, to use the same tool as the people who actually use 
webrsync for the security aspect, regardless of whether they could use 
normal rsync or not.

So that gets you a signed and verified tree.  Correct so far.

But as part of that tree, there are digest files for each package that 
verify the integrity of the ebuild as well as of the sources tarballs 
(distfiles).

Now it's important to grasp the difference between gpg signing and simple 
hash digests, here.

Anybody with the appropriate tools (md5sum, for example, does md5 hashes, 
but there's sha and other hashes as well, and the portage tree uses 
several hash algorithms in case one is broken) can take a hash of a file, 
and provided it's exactly the same bit-for-bit file they should get 
exactly the same hash.

In fact, that's how portage checks the hashes of both the ebuild files 
and the distfiles it uses, regardless of this webrsync-gpg stuff.  The 
tree ships the hash values that the gentoo package maintainer took of the 
files in its digest files, and portage takes its own hash of the files 
and compares it to the hash value stored in the digest files.  If they 
match, portage is happy.  If they don't, depending on how strict you have 
portage set to be (FEATURES=strict), it will either warn about (without 
strict) or entirely refuse to merge that package (with strict), until 
either the digest is updated, or a new file matching the old digest is 
downloaded.

So far so good, but while the hashes protect against accidental damage as 
the file was being downloaded, because anyone can take a hash of the 
file, without something stronger, if say one of the mirror operators was 
a bad guy, they could replace the files with hacked files and as long as 
they replaced the digest files with the new ones they created for the 
hacked files at the same time, portage wouldn't know.

So while hashes/digests alone protect quite well from accidental damage, 
they can't protect, by themselves, from deliberate replacement of those 
files with malware infested copies.

Which is where the gpg signed tree snapshots come in.  But before we can 
understand how they help, we need to understand how gpg signing differs 
from simple hashes.

PGP, gpg, and various other public/private-pair key signing (and 
encryption) take advantage of a particular mathematical relationship 
property between the public and private keys.  I'm not a cryptographer 
nor a mathematician, so I'm content to leave it at that rather handwavy 
assertion and not get into the details, but enough people I trust say the 
same thing about the details, and enough of our modern Internet banking 
and the like, depends upon the same idea, that I'm relatively confident 
in the general principle, at least.

It works like this.  People keep the private key from the pair private -- 
if it gets out, they've lost the secret.  But people publish the public 
half of the key.  The relationship of the keys is such that people can't 
figure out the private key from the public key, but if you have the 
private key, you can sign stuff with it, and people with the public key 
can verify the signature and thus trust that it really was the person 
with that key that signed the content.  Similarly, people can use the 
public key to encrypt something, and only the person with the private key 
will be able to decrypt it -- having the public key doesn't help.

Actually, as I understand it signing is simply a combination of hashing 
and encryption, such that a hash of the content to be signed is taken, 
and then that hash is encrypted with the private key.  Now anyone with 
the public key can "decrypt" the hash and verify the content with it, 
thereby verifying that the private key used to sign the content by 
encrypting the hash was the one used.  If some other key had been used, 
attempting to decrypt the hash with an unmatched public key would simply 
produce gibberish, and the supposedly "decrypted" hash wouldn't be the 
hash produced when checking the content, thereby failing to verify that 
the signed content actually came from the person that it was claimed to 
have come from.


OK, we've now established that hashes simply verify that the content 
didn't get modified in transit, but they do NOT by themselves verify who 
SENT that content, so indeed, a man-in-the-middle could have replaced 
BOTH the content and the hash, and someone relying on just hashes 
couldn't tell the difference.

And we've also established that a signature verifies that the content 
actually came from the person who had the private key matching the public 
key used to verify it, by mechanism of encrypting the hash of that 
content with the private key, so only by "decrypting" it with the 
matching public key, does the hash of the content match the one taken at 
the other end and encrypted with the private key.

*NOW* we're equipped to see how the portage tree snapshot signing method 
actually allows us to verify distfiles as well.  Because the tree 
includes digests that we can now verify came from our trusted source, 
gentoo, NOW those digests can be used to verify the distfiles, because 
the digests were part of the signed tree and nobody could tamper with 
that signed tree including those digests without detection.

If our nefarious gentoo mirror operator tried to switch out the source 
tarballs AND the digests, he could do so for normal rsync users, and for 
webrsync users not doing gpg verification, without detection.  But should 
he try that with someone that's using webrsync-gpg, he has no way to sign 
the tampered with tarball with the correct private key since he doesn't 
have it, and those using webrsync with FEATURES=webrsync-gpg would detect 
the tampered tarball as portage (via webrsync, via eix in your case) 
would reject that tarball as unverified.

So the hash-digest method used to protect ordinary rsync users (and 
webrsync users without webrsync-gpg turned on) from ACCIDENTAL damage, 
now protects webrsync-gpg users from DELIBERATE man-in-the-middle attacks 
as well, not because the digests themselves are different, but because we 
can now trust and verify that they came from a legitimate source.

Tho it should be noted that "legitimate source" is defined as anyone 
having access to that that private signing key.  So should someone breakin 
to the snapshotting server and steal that private key doing the signing, 
they now become a "legitimate source" as far as webrsync-gpg is concerned.


So where does that leave us in practice?

Basically here:

You're now verifying that the snapshot tarballs are coming from a source 
with the private signing key, and we're assuming that gentoo security 
hasn't been broken and thus that only gentoo's snapshot signing servers 
(and their admins, of course) have access to the private signing key, 
which in turn means we're assuming the machine with that signing key must 
be gentoo, and thus that the snapshotted tarballs are legit.

But it's actually webrsync in combination with FEATURES=webrsync-gpg 
that's doing that verification.

Once the verified tarball is actually unpacked on our system, portage 
operate just as it normally does, simply verifying the usual hash digests 
against the ebuilds and the distfiles /exactly/ as it normally would.  

Repeating in different words to hopefully ensure it's understood:

It's *ONLY* the fact that we have actually gpg-verified that snapshot 
tarball and thus the digests within it, that gives us any more security 
than an ordinary rsync user.  After that's downloaded, verified and 
unpacked, portage operates exactly as it normally does.


Meanwhile, part of that normal operation includes FEATURES=strict, if 
you've set it, which causes portage to refuse to merge the package if 
those digests don't match.  But that part of things is just normal 
portage operation.  Rsync users get it too -- they just don't have the 
additional assurance that those digest files actually came from gentoo 
(or at least from someone with gentoo's private signing key), that 
webrsync with FEATURES=webrsync-gpg provides.


(Meanwhile, one further personal note FWIW.  You may think that all these 
long explanations take quite some time to type up, and you'd be correct.  
But don't make the mistake of thinking that I don't get a benefit from it 
myself.  My dad was a teacher, and one of the things he used to say that 
I've found to be truer than true, is that the best way to /learn/ 
something is to try to teach it to someone.  That's exactly what I'm 
doing, and all the unexpected questions and corner cases that I'd have 
never thought about on my own, that people bring up and force me to think 
about in ordered to answer them, help me improve my own previously more 
handwavy and fuzzy "general concept" understanding as well.  I'm much 
more confident in my own understanding of the general public/private key 
concepts, how gpg actually uses them and how its web-of-trust works, and 
more specifically, how portage can use that via webrsync-gpg to actually 
improve the gentooer's own security, than I ever was before.

And it has been quite some time since I worked with gpg and saw it in 
interactive mode like that, too, and it turns out that in the intervening 
years, I've actually understood quite a bit more about how it all works 
than I did back then, thus my ability to dig that all up and present it 
here, while back a few years ago, I was just as clueless about how all 
that web-of-trust stuff worked, and make exactly the same mistake of 
"ultimately trusting" the distro's package-signing key, for exactly the 
same reasons.  Turns out I absorbed rather more from all those security 
and encryption articles I've read over the years than I realized, but it 
actually took my replies right here in this thread to lay it all out 
logically so I too realized how much more I understand what's going on 
now, than I did back then.)

So... Thanks for the thread! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

[gentoo-amd64] Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?)

Reply via email to