I’m not sure I understand the prior comment about compression.

I agree that hashing workflows are not simple nor of-themselves secure. I agree 
with the implication that they can explode in scope.

From what I can tell, the state of hashing verification tools reflects 
substantial confusion over their utility and purpose. In some ways it’s a 
quixotic attempt to re-invent LOCKSS or equivalent. In other ways it’s 
perfectly sensible.

I think that the move to evaluate SHA-256 reflects some clear concern over 
tampering (as does the history of LOCKSS e.g. Itself). This is not to say that 
MD5 collisions (much less, substitutions) are mathematically trivial, but 
rather, that they are now commonly contemplated.

Compare Bruce Schneier’s comments about abandoning SHA-1 entirely, or 
computation’s reliance on Cyclic Redundancy Checks. In many ways it’s an 
InfoSec consideration dropped in the middle of archival or library workflow 
specification.

--
Al Matthews
Software Developer, Digital Services Unit
Atlanta University Center, Robert W. Woodruff Library
email: amatth...@auctr.edu; office: 1 404 978 2057


From: Charles Blair <c...@uchicago.edu<mailto:c...@uchicago.edu>>
Organization: The University of Chicago Library
Reply-To: "c...@uchicago.edu<mailto:c...@uchicago.edu>" 
<c...@uchicago.edu<mailto:c...@uchicago.edu>>
Date: Friday, October 3, 2014 at 10:26 AM
To: "CODE4LIB@LISTSERV.ND.EDU<mailto:CODE4LIB@LISTSERV.ND.EDU>" 
<CODE4LIB@LISTSERV.ND.EDU<mailto:CODE4LIB@LISTSERV.ND.EDU>>
Subject: Re: [CODE4LIB] What is the real impact of SHA-256? - Updated

Look at slide 15 here:
http://www.slideshare.net/DuraSpace/sds-cwebinar-1

I think we're worried about the cumulative effect over time of
undetected errors (at least, I am).

On Fri, Oct 03, 2014 at 05:37:14AM -0700, Kyle Banerjee wrote:
On Thu, Oct 2, 2014 at 3:47 PM, Simon Spero 
<sesunc...@gmail.com<mailto:sesunc...@gmail.com>> wrote:

> Checksums can be kept separate (tripwire style).
> For JHU archiving, the use of MD5 would give false positives for duplicate
> detection.
>
> There is no reason to use a bad cryptographic hash. Use a fast hash, or use
> a safe hash.
>

I have always been puzzled why so much energy is expended on bit integrity
in the library and archival communities. Hashing does not accommodate
modification of internal metadata or compression which do not compromise
integrity. And if people who can access the files can also access the
hashes, there is no contribution to security. Also, wholesale hashing of
repositories scales poorly,  My guess is that the biggest threats are staff
error or rogue processes (i.e. bad programming). Any malicious
destruction/modification is likely to be an inside job.

In reality, using file size alone is probably sufficient for detecting
changed files -- if dup detection is desired, then hashing the few that dup
out can be performed. Though if dups are an actual issue, it reflects
problems elsewhere. Thrashing disks and cooking the CPU for the purposes
libraries use hashes for seems way overkill, especially given that basic
interaction with repositories for depositors, maintainers, and users is
still in a very primitive state.

kyle


--
Charles Blair, Director, Digital Library Development Center, University of 
Chicago Library
1 773 702 8459 | c...@uchicago.edu<mailto:c...@uchicago.edu> | 
http://www.lib.uchicago.edu/~chas/


**************************************************************************************************
The contents of this email and any attachments are confidential.
They are intended for the named recipient(s) only.
If you have received this email in error please notify the system
manager or  the 
sender immediately and do not disclose the contents to anyone or
make copies.

** IronMail scanned this email for viruses, vandals and malicious
content. **
**************************************************************************************************

Reply via email to