Re: Files with identical SHA1 breaks the repo

2017-03-02 Thread Daniel Shahaf
Garance A Drosehn wrote on Wed, Mar 01, 2017 at 14:48:07 -0500:
> On 1 Mar 2017, at 7:18, Daniel Shahaf wrote:
> 
> > Stefan Sperling wrote on Wed, Mar 01, 2017 at 11:01:40 +0100:
> >> On Tue, Feb 28, 2017 at 10:17:34PM -0600, Greg Stein wrote:
> >>> I really like this idea.
> >>>
> >>> And we could take a copy of APR's sha1 code, and rejigger
> >>> it to perform *both* hashes during the same scan of the
> >>> raw bytes. I would expect the time taken to extend by
> >>> (say) 1.1X rather than a full 2X. The inner loop might
> >>> cost a bit more, but we'd only scan the bytes once. Very
> >>> handy, when you're talking about megabytes in a stream-y
> >>> environment.
> >>>
> >>> (and medium-term, push this dual-sha1 computation back
> >>> into APR)
> >>
> >> The idea is nice and I would support its application.
> >
> > I would not.
> >
> > Using two variants of sha1 is fine for supporting webkit's
> > use-case, or to protect admins from disgruntled employees
> > who commit the known collision to DoS their employer.
> >
> > However, when the attacker model is a competent/resourceful
> > attacker, using two variants of sha1 only increases the
> > attack's complexity by a factor of about 2⁷ ([1, §4.1]),
> > and might in fact cause collisions to be _easier_ to find
> > (since additional state is output).
> >
> > So let's not add a variant of sha1.
> 
> Note that is not a variant of sha1, which implies that it
> is a *different* algorithm.  It's the same algorithm, run
> with two different inputs.
> 
> The paper you reference:
> https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf
> 
> is interesting, and some of it might be going over my head.
> However, I'll note that it seems to be about:
> 
>"... unwanted dependencies between two slightly
> different instances of the same hash function"
> 
> That is not what I am suggesting.  I am suggesting to use
> the exact same hash function, but giving the function two
> vastly different collections of data.

I understood what you were suggesting, and it's precisely what the paper
talks about.  Consider the function that takes as input a file (more
accurately: a byte stream) and returns the sha1 of every other word of
that file; that function is a hash function and is a variant of sha1.

In terms of that paper, you can think of F as sha1 and G as "sha1 of
every other word" (or the other way around).

> And not only that,
> but there is an ordering dependency between the two sets of
> data.  It's mighty hard to add data to the first set (the
> entire file) without causing significant disruption in the
> *actual data* that will be fed to the second set.
> 
> I do not see how this extended-sha1 would be easier to break
> than the current sha1, because it includes the current sha1,
> unchanged.

In cryptography, "it appears to be hard" and "I see no way to break it"
don't count: the fact that one does not see an attack, does not mean
that no attack exist.  (Absence of evidence isn't evidence of absence.)

What does count is security proofs ("No attacker with resources X can
achieve Y in time T with probability exceeding 1/2**N"), and failing
that, constructions that have stood the test of time.  Nothing either of
us comes up with off the top of his head is going to meet either of
these criteria.

Regarding "easier to break", you were probably thinking of collision
attacks, which are never made easier by extending the output.  I was
thinking of "first preimage" attacks (finding a message that has a given
checksum).  Extending the output can make _these_ attacks easier.  For
example, if I told you my PIN is four decimal digits you wouldn't know
what it was, but if I told you not only the number of digits but also
that their sum was 36, you'd know the PIN was .

> And then it adds a second sha1 digest to that,
> by effectively hashing a *different file* (the file created
> by using only every-other-byte of the original file).
> 
> Obviously this is also not perfect, but then there will
> never be a hashing algorithm which can perfectly hash all
> 1-megabyte files of arbitrary data into unique 256-byte
> values. But it sure seems like it will make it much harder
> to *calculate* a second file which will sum-up to the same
> digest values (two values) as any given datafile.

That's called a "second preimage attack".  The extended construction is
not secure against such an attack: a hash function with a 320-bit output
is supposed to require O(2³²⁰) evaluations to find a second preimage,
but it's trivial to find a second preimage for F and G simultaneously
with only O(2¹⁶⁰) evaluations.  (Do you see that?)

A similar argument holds for collision attacks: a collision attack
against a perfect hash function with a 320-bit output should have
complexity 2¹⁶⁰; however, the paper provides an attack with
complexity 2⁸⁸.  There is also a trivial, but specialized to these
particular F and G, attack with complexity 2⁸⁰.

In both cases, the attacks cost as much as t

Suspected issue with "remove-zombie-locks.py"?

2017-03-02 Thread Doug Robinson
Folks:

The following commands were run on a CentOS 6.x, WANdisco Subversion 1.9.5,
Python 2.7:
--
# svn export https://svn.apache.org/repos/asf/subversion/trunk/
contrib/hook-scripts
# svnadmin create repo
# svn co file:///path/to/repo wc
# cd wc
# touch wc/a; svn add wc/a; svn ci -mm wc
# svn lock wc/a; svn rm wc/a; svn ci -mm --no-unlock wc
# hook-scripts/remove-zombie-locks.py repo1 all

Removing all zombie locks from repository at /root/repo

This may take several minutes...

/a

Traceback (most recent call last):

  File "hook-scripts/remove-zombie-locks.py", line 214, in 

main()

  File "hook-scripts/remove-zombie-locks.py", line 208, in main

remover.run()

  File "hook-scripts/remove-zombie-locks.py", line 138, in run

self.unlock_nonexistent_files, self.pool)

  File "/opt/rh/python27/root/usr/lib64/python2.7/site-packages/libsvn/fs.py",
line 860, in svn_fs_get_locks2

return _fs.svn_fs_get_locks2(*args)

svn.core.SubversionException: 160034 - No username is currently associated
with filesystem 'd52a5571-d2a3-4db8-a9e5-a081d6b4b455'

--


Is this a known issue?  Or should I file a bug?


Thank you.


Doug
-- 
*DOUGLAS B. ROBINSON* SENIOR PRODUCT MANAGER

*T *925-396-1125
*E* doug.robin...@wandisco.com

*www.wandisco.com *

-- 


Learn how WANdisco Fusion solves Hadoop data protection and scalability 
challenges 

Listed on the London Stock Exchange: WAND 


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
 If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone.  Any 
distribution, use or copying of this e-mail or the information it contains 
by other than an intended recipient is unauthorized.  The views and 
opinions expressed in this e-mail message are the author's own and may not 
reflect the views and opinions of WANdisco, unless the author is authorized 
by WANdisco to express such views or opinions on its behalf.  All email 
sent to or from this address is subject to electronic storage and review by 
WANdisco.  Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.


Re: Files with identical SHA1 breaks the repo

2017-03-02 Thread Garance A Drosehn

On 2 Mar 2017, at 6:37, Daniel Shahaf wrote:


Garance A Drosehn wrote on Wed, Mar 01, 2017 at 14:48:07 -0500:


I do not see how this extended-sha1 would be easier to
break than the current sha1, because it includes the
current sha1, unchanged.


Regarding "easier to break", you were probably thinking of
collision attacks, which are never made easier by extending
the output.  I was thinking of "first preimage" attacks
(finding a message that has a given checksum).  Extending
the output can make _these_ attacks easier.


Yes, in the context of subversion I'm thinking of collision
attacks, and only collision attacks.


That's called a "second preimage attack".  The extended
construction is not secure against such an attack: a hash
function with a 320-bit output is supposed to require
O(2³²⁰) evaluations to find a second preimage, but it's
trivial to find a second preimage for F and G simultaneously
with only O(2¹⁶⁰) evaluations.  (Do you see that?)


But consider that sha1 is a 160-bit output.  With the paper
that Google just released, did they need to do O(2¹⁶⁰)
evaluations to create a collision?

When I looked at their announcement at

https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

I noticed the following comments:

  "In practice, collisions should never occur for secure
   hash functions. However if the hash algorithm has some
   flaws, as SHA-1 does, a well-funded attacker can craft
  ^
   a collision."

As to how much work it was:
  "- 6,500 years of CPU computation to complete the
 attack first phase.
   - 110 years of GPU computation to complete the
 second phase."

And:
  "While those numbers seem very large, the SHA-1
   shattered attack is still more than 100,000 times
   faster than a brute force attack, which remains
   impractical."

My thinking is that their attack is not a brute-force attack,
and thus assuming the ideal of O(2¹⁶⁰) is misleading.

   "In 2013, Marc Stevens published a paper that outlined
a theoretical approach to create a SHA-1 collision"

I'll admit that I have no idea how his theoretical insight
would apply to a double-sha1 hash.  But if all it does is
get us back to O(2¹⁶⁰) (despite using 320-bits!), that
leaves us 100,000 times better off than we currently are.


[...].  But it might be
faster to use this tactic instead of making the much more
substantial change of totally dropping sha1 for sha256.

(I don't mean it will necessarily be faster to implement,
but just that it might run faster in day-to-day operations)


I concede the performance issue.

Cheers,


Indeed, performance is the only reason I suggested this. Some
comments in this thread were concerned about the performance-
hit of moving to sha256.

I was also thinking it *might* be easier to make a double-sha1 
repository backwards-compatible with older versions of svn.

But I do not know anything about the code in subversion, so
that's total speculation on my part.

Thanks for the extra info.  The paper was interesting, even
if some of it went over my head...

--
Garance Alistair Drosehn= dro...@rpi.edu
Senior Systems Programmer   or   g...@freebsd.org
Rensselaer Polytechnic Institute; Troy, NY;  USA


RE: Files with identical SHA1 breaks the repo

2017-03-02 Thread Markus Schaber
Hi,

I think there are two different points, here are my thoughts:


The first point is that both Repository and Client should work fine in case of 
collisions. Whatever algorithm we choose, we can never be sure that there won't 
be a collision.

That could be achieved e. G. by a mandatory full-text comparison if the hashes 
match, and just not using the cached rep if the hashes match, but the full text 
is not identical. (I'm not sure about the performance implications, however.)

Similarly, the pristine store could put an "ABCD_2" file next to the "ABCD" 
file if hashes match, but full text does not match.

The protocol must never rely on the hash as a unique identifier for the content 
- it should use it merely as a checksum to verify integrity.

As SVN is not a crypto product, neither integrity nor security should depend on 
the strength of a specific hash algorithm.


The second point is to reduce the likelihood of collisions, which is discussed 
in this (sub-)thread.

I tend to think that if SVN manages to work fine in case of collisions, then 
reducing the likelihood of collisions might not be worth that much effort. 
Collisions will still be a rare event, after all. 

(And even for those security researchers which work with sets of colliding 
files, they will still have a working system, although it might not be fully 
optimized.)

As we already have SHA1 and MD5 algorithms established, we could just combine 
them both. That should deliver acceptable performance, while still keeping the 
collision probability low enough.

Or maybe it's just not worth any effort, once the first point is solved.

DOSing a repository by checking in many colliding files still requires write 
access, and there are more effective ways to cause damage once you've write 
access.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this 
e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
> From: Garance A Drosehn [mailto:dro...@rpi.edu]
> Sent: Thursday, March 02, 2017 7:32 PM
> To: Daniel Shahaf
> Cc: Subversion Development
> Subject: Re: Files with identical SHA1 breaks the repo
> 
> On 2 Mar 2017, at 6:37, Daniel Shahaf wrote:
> 
> > Garance A Drosehn wrote on Wed, Mar 01, 2017 at 14:48:07 -0500:
> >>
> >> I do not see how this extended-sha1 would be easier to break than
> the
> >> current sha1, because it includes the current sha1, unchanged.
> >
> > Regarding "easier to break", you were probably thinking of
> collision
> > attacks, which are never made easier by extending the output.  I
> was
> > thinking of "first preimage" attacks (finding a message that has a
> > given checksum).  Extending the output can make _these_ attacks
> > easier.
> 
> Yes, in the context of subversion I'm thinking of collision attacks,
> and only collision attacks.
> 
> > That's called a "second preimage attack".  The extended
> construction
> > is not secure against such an attack: a hash function with a 320-
> bit
> > output is supposed to require
> > O(2³²⁰) evaluations to find a second preimage, but it's trivial to
> > find a second preimage for F and G simultaneously with only O(2¹⁶⁰)
> > evaluations.  (Do you see that?)
> 
> But consider that sha1 is a 160-bit output.  With the paper that
> Google just released, did they need to do O(2¹⁶⁰) evaluations to
> create a collision?
> 
> When I looked at their announcement at
> 
> https://security.googleblog.com/2017/02/announcing-first-sha1-
> collision.html
> 
> I noticed the following comments:
> 
>"In practice, collisions should never occur for secure
> hash functions. However if the hash algorithm has some
> flaws, as SHA-1 does, a well-funded attacker can craft
>^
> a collision."
> 
> As to how much work it was:
>"- 6,500 years of CPU computation to complete the
>   attack first phase.
> - 110 years of GPU computation to complete the
>   second phase."
> 
> And:
>"While those numbers seem very large, the SHA-1
> shattered attack is still more than 100,000 times
> faster than a brute force attack, which remains
> impractical."
> 
> My thinking is that their attack is not a brute-force attack, and
> thus assuming the ideal of O(2¹⁶⁰) is mi