Possible vulnerability to SHA-1 collisions

2012-11-24 Thread Michael Hirshleifer
Evil Guy creates 2 files, 1 evil and 1 innocuous, with the same SHA-1 
checksum (including Git header). Mr. Evil creates a local branch with an 
innocuous name like “test-bugfix”, and adds a commit containing a 
reference to the evil file. Separately, using a sockpuppet, Evil Guy 
creates an innocuous bugfix (very likely to be accepted) containing the 
innocuous file, and submits it to Good Guy. Before Good Guy can commit 
the bugfix, Evil Guy pushes the evil branch to Github, and then 
immediately deletes it; or equivalently --force pushes any innocuous 
commit on top of it. (This is unlikely to arouse suspicion, and he can 
always say he deleted it because it didn’t work.)


Git keeps unreferenced objects around for a few weeks, so when Good Guy 
commits the patch and pushes to Github, an object with an sha1sum that 
matches the good file will already exist in the main repository. Since 
Git keeps the local copy of files when sha1sums match, the main Github 
repository will then contain the evil file associated with Good Guy’s 
commit. Any users cloning from Github will get the evil version. This is 
an exploit.


And Good Guy’s local repository will contain the good file; he will not 
notice anything amiss unless he nukes his local repository and clones 
from Github again. Even when the compromise is discovered, there will be 
no reason to suspect Evil Guy; the evil file seems to have been 
committed by Good Guy.


Previous discussion about hash collisions in Git seems to conclude that 
they aren’t a security threat. See 
http://stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob/9392525#9392525, 
Linus Torvalds arguing that Git’s security doesn’t depend on SHA-1 
collision resistance.


This proposed exploit does not involve social engineering, or any good 
guys failing to spot or accepting patches containing evil data (what 
Good Guy accepts is a genuine bugfix). It contaminates the main public 
repository in a way that Good Guy won’t immediately notice. It does not 
require a second-preimage attack; Bad Guy creates both versions of the 
file. While this does require the bad guy to have commit access, the bad 
guy can avoid suspicion after the attack.



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible vulnerability to SHA-1 collisions

2012-11-24 Thread Shawn Pearce
I don't think there is an issue the way you have tried to describe
this scenario.

On Sat, Nov 24, 2012 at 3:12 AM, Michael Hirshleifer <111...@caltech.edu> wrote:
> Evil Guy creates 2 files, 1 evil and 1 innocuous, with the same SHA-1
> checksum (including Git header). Mr. Evil creates a local branch with an
> innocuous name like “test-bugfix”, and adds a commit containing a reference
> to the evil file. Separately, using a sockpuppet, Evil Guy creates an
> innocuous bugfix (very likely to be accepted) containing the innocuous file,
> and submits it to Good Guy. Before Good Guy can commit the bugfix, Evil Guy
> pushes the evil branch to Github, and then immediately deletes it; or
> equivalently --force pushes any innocuous commit on top of it. (This is
> unlikely to arouse suspicion, and he can always say he deleted it because it
> didn’t work.)

Here you assume Evil Guy has write access to the same repository as
Good Guy. Lets assume this is possible, e.g. Evil Guy is actually
impersonating White Hat because he managed to steal White Hat's
credentials through a compromised host. Typically Evil Guy doesn't
have write access to Good Guy's repository, and thus can't introduce
objects into it without Good Guy being the one that creates the
objects.

But lets just keep he assumption that Evil Guy can write to the same
repository as Good Guy, and that he managed to create the bad branch
and delete it, leaving the bad object in an unreachable state for 2
weeks.

> Git keeps unreferenced objects around for a few weeks, so when Good Guy
> commits the patch and pushes to Github, an object with an sha1sum that
> matches the good file will already exist in the main repository. Since Git
> keeps the local copy of files when sha1sums match, the main Github
> repository will then contain the evil file associated with Good Guy’s
> commit. Any users cloning from Github will get the evil version. This is an
> exploit.

Typically... Git will fail with an error message when Good Guy pushes.
Good Guy's client will (rightly) believe that the object doesn't exist
on the remote side, after all it is unreachable. So his client will
include it in the pack being transmitted during push. When this pack
arrives on the remote side, the remote will identify it already has an
object named the same as an object coming in the pack. The remote will
do a byte-for-byte compare of both objects. As soon as a single byte
differs, it will abort with an error.

At this point Good Guy can't push to his repository. `git gc
--expire=now` will fix the repository by removing the unreachable
object, at which point Evil Guy's evil object is gone.

> And Good Guy’s local repository will contain the good file; he will not
> notice anything amiss unless he nukes his local repository and clones from
> Github again. Even when the compromise is discovered, there will be no
> reason to suspect Evil Guy; the evil file seems to have been committed by
> Good Guy.

See above. Good Guy would have noticed something is amiss because the
object he sent already existed and didn't match.

> Previous discussion about hash collisions in Git seems to conclude that they
> aren’t a security threat. See
> http://stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob/9392525#9392525,
> Linus Torvalds arguing that Git’s security doesn’t depend on SHA-1 collision
> resistance.

This is largely true because there are additional defenses (e.g. the
byte for byte compare on identical objects), and for projects like the
Linux kernel there are many eyes looking at files all of the time.
Anything that is amiss would be announced quickly on LKML and
discussed until the root cause is identified and resolved.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible vulnerability to SHA-1 collisions

2012-11-27 Thread Jeff King
On Sat, Nov 24, 2012 at 10:09:31AM -0800, Shawn O. Pearce wrote:

> On Sat, Nov 24, 2012 at 3:12 AM, Michael Hirshleifer <111...@caltech.edu> 
> wrote:
> > Evil Guy creates 2 files, 1 evil and 1 innocuous, with the same SHA-1
> > checksum (including Git header). Mr. Evil creates a local branch with an
> > innocuous name like “test-bugfix”, and adds a commit containing a reference
> > to the evil file. Separately, using a sockpuppet, Evil Guy creates an
> > innocuous bugfix (very likely to be accepted) containing the innocuous file,
> > and submits it to Good Guy. Before Good Guy can commit the bugfix, Evil Guy
> > pushes the evil branch to Github, and then immediately deletes it; or
> > equivalently --force pushes any innocuous commit on top of it. (This is
> > unlikely to arouse suspicion, and he can always say he deleted it because it
> > didn’t work.)
> 
> Here you assume Evil Guy has write access to the same repository as
> Good Guy. Lets assume this is possible, e.g. Evil Guy is actually
> impersonating White Hat because he managed to steal White Hat's
> credentials through a compromised host. Typically Evil Guy doesn't
> have write access to Good Guy's repository, and thus can't introduce
> objects into it without Good Guy being the one that creates the
> objects.
> 
> But lets just keep he assumption that Evil Guy can write to the same
> repository as Good Guy, and that he managed to create the bad branch
> and delete it, leaving the bad object in an unreachable state for 2
> weeks.

Actually, it is somewhat easier on GitHub, because we share objects
between forks of a repository via the alternates mechanism. So if you
can publicly fork the project and push a branch to your fork, you can
write to the shared object database. This applies not just to GitHub,
but to any hosting service which shares object databases between
projects (I do not know offhand if other hosting providers like Google
Code do this).

But as you noted later in your email, the byte-for-byte comparison on
object collision will let us detect this case when the good guy tries to
push and abort.

-Peff

PS I also think the OP's "sockpuppet creates innocuous bugfix" above is
   easier said than done. We do not have SHA-1 collisions yet, but if
   the md5 attacks are any indication, the innocuous file will not be
   completely clean; it will need to have some embedded binary goo that
   is mutated randomly during the collision process (which is why the
   md5 attacks were demonstrated with postscript files which _rendered_
   to look good, but contained a chunk of random bytes in a spot ignored
   by the postscript interpreter).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible vulnerability to SHA-1 collisions

2012-11-27 Thread Aaron Schrab

At 18:07 -0500 27 Nov 2012, Jeff King  wrote:

PS I also think the OP's "sockpuppet creates innocuous bugfix" above is
  easier said than done. We do not have SHA-1 collisions yet, but if
  the md5 attacks are any indication, the innocuous file will not be
  completely clean; it will need to have some embedded binary goo that
  is mutated randomly during the collision process (which is why the
  md5 attacks were demonstrated with postscript files which _rendered_
  to look good, but contained a chunk of random bytes in a spot ignored
  by the postscript interpreter).


I don't think that really saves us though.  Many formats have parts of 
the file which will be ignored, such as comments in source code.  With 
the suggested type of attack, there isn't a requirement about which 
version of the file is modified.  So the attacker should be able to 
generate a version of a file with an innocuous change, get the SHA-1 for 
that, then add garbage comments to their malicious version of the file 
to try to get the same SHA-1.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible vulnerability to SHA-1 collisions

2012-11-27 Thread Jeff King
On Tue, Nov 27, 2012 at 06:30:17PM -0500, Aaron Schrab wrote:

> At 18:07 -0500 27 Nov 2012, Jeff King  wrote:
> >PS I also think the OP's "sockpuppet creates innocuous bugfix" above is
> >  easier said than done. We do not have SHA-1 collisions yet, but if
> >  the md5 attacks are any indication, the innocuous file will not be
> >  completely clean; it will need to have some embedded binary goo that
> >  is mutated randomly during the collision process (which is why the
> >  md5 attacks were demonstrated with postscript files which _rendered_
> >  to look good, but contained a chunk of random bytes in a spot ignored
> >  by the postscript interpreter).
> 
> I don't think that really saves us though.  Many formats have parts
> of the file which will be ignored, such as comments in source code.

Agreed, it does not save us unconditionally. It just makes it harder to
execute the attack. Would you take a patch from a stranger that had a
kilobyte of binary garbage in a comment?

A more likely avenue would be a true binary file where nobody is
expected to read the diff.

> With the suggested type of attack, there isn't a requirement about
> which version of the file is modified.  So the attacker should be
> able to generate a version of a file with an innocuous change, get
> the SHA-1 for that, then add garbage comments to their malicious
> version of the file to try to get the same SHA-1.

That's not how birthday collision attacks usually work, though. You do
not get to just mutate the malicious side and leave the innocuous side
untouched. You are mutating both sides over and over and hoping to find
a matching sha1 from the "good" and "evil" sides.

Of course, I have not been keeping up too closely with the efforts to
break sha-1. Maybe there is something more nefarious about the current
attacks. I am just going off my recollection of the md5 collision
attacks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible vulnerability to SHA-1 collisions

2012-11-28 Thread Andreas Ericsson
On 11/28/2012 01:27 AM, Jeff King wrote:
> On Tue, Nov 27, 2012 at 06:30:17PM -0500, Aaron Schrab wrote:
> 
>> At 18:07 -0500 27 Nov 2012, Jeff King  wrote:
>>> PS I also think the OP's "sockpuppet creates innocuous bugfix" above is
>>>   easier said than done. We do not have SHA-1 collisions yet, but if
>>>   the md5 attacks are any indication, the innocuous file will not be
>>>   completely clean; it will need to have some embedded binary goo that
>>>   is mutated randomly during the collision process (which is why the
>>>   md5 attacks were demonstrated with postscript files which _rendered_
>>>   to look good, but contained a chunk of random bytes in a spot ignored
>>>   by the postscript interpreter).
>>
>> I don't think that really saves us though.  Many formats have parts
>> of the file which will be ignored, such as comments in source code.
> 
> Agreed, it does not save us unconditionally. It just makes it harder to
> execute the attack. Would you take a patch from a stranger that had a
> kilobyte of binary garbage in a comment?
> 
> A more likely avenue would be a true binary file where nobody is
> expected to read the diff.
> 
>> With the suggested type of attack, there isn't a requirement about
>> which version of the file is modified.  So the attacker should be
>> able to generate a version of a file with an innocuous change, get
>> the SHA-1 for that, then add garbage comments to their malicious
>> version of the file to try to get the same SHA-1.
> 
> That's not how birthday collision attacks usually work, though. You do
> not get to just mutate the malicious side and leave the innocuous side
> untouched. You are mutating both sides over and over and hoping to find
> a matching sha1 from the "good" and "evil" sides.
> 
> Of course, I have not been keeping up too closely with the efforts to
> break sha-1. Maybe there is something more nefarious about the current
> attacks. I am just going off my recollection of the md5 collision
> attacks.
> 

AFAIR, collision attacks can be executed with a 2^51 probability (with
a 2^80 claim, that's pretty bad), but preimage attacks are still stuck
very close to the claimed 2^160.

That means every attack involving SHA1 means Mr. Malicious creates
both the involved files or does exceptional research without sharing
it.

I think git's job is to make sure that write access to only one of
the repositories is insufficient to launch an attack. If the attacker
manages to change all repositories involved then the hash function
used is really quite irrelevant.

-- 
Andreas Ericsson   andreas.erics...@op5.se
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html