Re: Hash database
On Sat, Apr 09, 2005 at 04:16:27PM +0200, Raffaele D'Elia wrote: > Unfortunatly not. I want to verify each file installed using .deb's > against the md5sum written inside the .deb itself. > Debsum does this storing the hashes locally. I want the same control > over a central db, independent from the machine I'm running debsums on. You could extract the checksums from a set of existing .deb packages with a simple script (example below), then put the generated file containing all md5sums onto some shared, read-only location, and verify whatever machine's installed files against this checksum "database". Here's a sample script for you to get started: #!/bin/sh srcdir=/var/cache/apt/archives tmpdir=/tmp/md5sums/DEBIAN targetfile=./md5sums.all rm -f $targetfile for deb in $srcdir/*.deb ; do rm -rf $tmpdir mkdir -p $tmpdir dpkg-deb -e $deb $tmpdir if [ -f $tmpdir/md5sums ] ; then cat $tmpdir/md5sums >>$targetfile else echo No md5sums for $deb! fi done After adjusting paths and running this, you should have a file md5sums.all. To verify against it, cd to / (all filenames associated with the checksums are relative to /), and run md5sum: # cd / ; md5sum -c /path/to/md5sums.all For every mismatch, you'll get "md5sum: MD5 check failed for ...". In case there are packages missing md5sums files (does happen), you can generate them with "debsums --generate", and grab them from /var/lib/dpkg/info/*.md5sums (IIRC). Is that closer to what you want? :) Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: Hash database
On Sat, Apr 09, 2005 at 03:04:31PM +0200, Raffaele D'Elia wrote: > I'm looking for hash of installed files. > > I already know debsums, but I need a something undependent from local > hash database... > > Some ideas? I know also about tripwire, but tripwire create his > reference db from the system itself, not from an unwriteable media. Not exactly sure whether I understand what you need, but the most basic (and thus most flexible) way would be something like: Generate the list of checksums: $ find . -type f | xargs md5sum > chksums Some time later, verify them: $ md5sum -c chksums Source (here '.', i.e. $PWD), and destination (chksums) can essentially be any path -- only chksums needs to be writable, of course. Before generating the hash list, think about how precisely you want the paths to end up in the file chksums, so that they'll be found when you later try to verify the files. This depends on what start-path you give to 'find' to recurse from, e.g. 'find .'will give you entries like ./some/relative/path, while 'find $PWD' will give you /the/absolute/path/to/the/file. Also, if you can't get it the way you need it right from the start, there's always sed/perl/whatever to fixup the path prefix to what it needs to be for md5sum to locate the files when verifying hashes. Is that roughly what you need? Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Wed, Aug 25, 2004 at 01:15:13AM -0400, Hubert Chan wrote: > > Ah, but then using that definition of "oneway", every hash is oneway, > since there must always be some hash value corresponding to two > different input strings (assuming the input space is larger than the > output space, which is generally the case). Since every hash is oneway, > this renders the term meaningless. So the only useful notion of oneway > is that the hash is not easily invertible (i.e. you can't easily find > some string that produces a given hash value). Okay, I guess I finally got it. Thanks for the clarifications. Let me just rephrase it in my own words to make sure my updated understanding now matches the notion commonly held in cryptography circles. No need to respond unless you still find some flaws in it :) So, if you can somehow come up with an input string (except by brute force search), which computes to some given hash, that means you inverted the function, and it's thus not oneway -- nothing more and nothing less. It has nothing to do with whether there exists some theoretic backward mapping from output to input that would uniquely retrieve the string originally used to compute the hash. The crucial point here simply was my rather different conception of invertability. So, now, the addition operation I mentioned is clearly _not_ oneway, in contrast to what I proclaimed originally ;) Makes sense now -- and makes much of what's been said so far appear in a different light. (And it hopefully explains some of the objections I had, that presumably must have seemed a little weird to anyone with a 'cryptographic' mindset...) Thanks again everyone for taking the time. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Wed, Aug 25, 2004 at 10:01:25AM +0100, Dale Amon wrote: > > Hashing (think of cornbeef hash with things all chopped up into bits) > is a technique to generate fast lookup keys. The discussion here is > about cryptographic hash functions. ...and I think somewhere in between lie hashing functions like crc32, as used for detecting transmission errors, for example. Those are not cryptographic, but possess a sufficiently large output space, so we can expect few random collisions for most practical purposes. > I believe the primary difference is that a cryptographic hash is: > > * a uniform mapping. For input space n and output space m, > there are on average n/m strings with the same hash key. Right, but I believe that a uniform mapping _also_ is a desirable property (besides speed, of course) of hashing functions as used to compute table lookup indices -- as this property assures that the data storage locations will be spread as evenly as possible across the available buckets, which in turn minimizes the time spent on resolving collisions (on average). And, as practical considerations in this case always enforce a rather small output space (i.e. number of buckets) we're certainly expecting collisions here. [1] > * randomness. Input strings which differ by 1 bit in any > position generate hash keys a random distance apart I'd add: * huge size of the output space (with its upper limit corresponding to the number of bits of the hash value). The probability of accidentally finding a collision is of course directly related to the size of the output space (assuming a uniform mapping). I think those are the basic properties that allow for the two desirable features 'collision resistance' and 'onewayness', which we've been discussing in this thread -- but I guess I'm not really telling you anything new... ;) If anyone knows of any other requirements, please feel free to chime in... Well, OTOH, this would probably be getting a little off-topic for debian-security (especially the debian aspect). Almut [1] for anyone who doesn't know it already, I'd recommend Bob Jenkins page on hashing: http://burtleburtle.net/bob/hash/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Tue, Aug 24, 2004 at 11:01:58PM +0200, Moritz Schulte wrote: > > (...) But if your hash function is pretty good in respect to > collision-resistance but is not one-way (being similar to a 1:1 > mapping between hash input and hash output), you could simply apply > the inverse function to your hash output and are already done. If that was possible for md5, it would be an ingenius compression algorithm, as you could sqeeze several hundred Megs or more into 128 bits, and still be able to retrieve the original data... ;) Somewhat more seriously: are there generally any defining criteria for something one would call a 'hash function', saying that it always must map some larger input space to some smaller output space? I'm thinking of something like the following: a trivial, reversible 1:1 mapping would be to simply rotate every ASCII value in a string by some N (e.g. 1->2, 2->3, ..., 255->1). That procedure would fit the above mentioned properties, in that it's perfectly reversible, and also pretty collision-resistant -- at least, from the top of my head, I couldn't think of any reason why there should be any two inputs mapping to the same output. But I don't think that that'd be considered a hash function... (BTW, I'm not making any claims whatsoever about its usefulness in the context of computing checksums, so please don't get me wrong there.) Anyway, it's 6 a.m. here, and I got to get some sleep now... so, I won't pester you any further :) Thanks everyone for the input! Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Wed, Aug 25, 2004 at 12:39:57AM +0200, Rolf Kutz wrote: > > If you can calculate a collision from the hash and > the known password, that would be a lack off > collision resistance. Is knowing the password really a prerequisite? I'd have said that if you can find a collision at all, or calculate a collision for a given hash, that would be lack of collision resistance... > > The difference between a hash for a signature and > a hash for a password is that you know the plain > text in the first case. Sure. But does it really make such a difference for finding a collision, if you know the plaintext, rather than its hash? The latter can always be computed trivially in the usual forward fashion. (Just asking out of curiosity, not to argue in any direction whatsoever...) > > documents, you'd probably have some very specific message in mind (at > > least not some random string) that you'd like to fake as originating > > from someone else. > > This depends on how the attack really works. If > you just need to flip a few bits in a document it > might just look like typos (think crc32). If your > document is a tarball or a .deb you might be able > to insert a lot of "garbage" to it without being > noticed. I agree, in general. OTOH, if you have something like a tar.gz file, I'd presume it's rather challenging to make some change to the content being packaged in such a way that both tar.gz's still have the same md5, same size -- and unpack without error. And even more challenging, if that change is supposed to achieve a certain predefined effect... ;) Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Tue, Aug 24, 2004 at 11:09:39PM +0200, Danny De Cock wrote: > >>for password schemes, it is important that the hash function used is > >>one-way: if one knows the password, it must be very simple/easy to > >>compute the hash of that password, but if someone obtained the hash of > >>a password, it must be very difficult to find something, say z, so that > >>hash(z) equals the hash of the password. > > > >but that's property 1 then (i.e. collision resistance), isn't it? > > not exactly the same, but it is true that the two properties are somehow > related: if you can find a z so that hash(z) equals hash(x), and x and z > are different, then you have found a collision, if not, you have inverted > the hash function. I totally agree so far. > being able to invert a hash function clearly means that the function is > not collision-resistant, does it? (presuming that retrieving that x from hash(x) is not considered 'finding a collision' -- there might not necessarily exist another input value with the same hash, after all) On a related note, as hash functions typically are many-to-one type of mappings, how can they ever be inverted anyway? > > summarizing: > an attack relying on the collision-resistance takes as > input two values x and y and results in an > output hash(x) which equals hash(y), but > an attack on the one-way property of the hash function takes as > input hash(x) and produces an > output z so that hash(z) equals hash(x). > > this is quite different, isn't it? :) hmm, well, I guess that's where our notions/terminology start to differ slightly... The main difference I see there is in the intent of the attacks, not so much in the property they're trying a attack (or the lack of a property the attack tries to exploit). The first type aims at finding some y for some given x, so that their hashes do not differ, but it's irrelevant what the specific hash value is. The second type starts from some given hash(x), and tries to find some z that yields the same hash (as in the password cracking case). But both types of attacks are based on the existence of collisions. Only trying to find that very x that originally produced hash(x) would be an attack on the one-way property, IMHO. I don't quite see why the second attack must have anything to do with the function being one-way or not: even if the function was perfectly one-way, but not collision-resistant, you might still find that z (z is not necessarily the inverse of hash(x) after all -- at least not as I understand the term... :) Let me just use a trivial example - the simple addition operation - to elaborate on my notions of 'onewayness' and 'collision resistance': When we add 2+3, we get 5. Great. :) That sum kind of represents the 'hash' or checksum. This operation is clearly not reversible (i.e. when only being given the 5, you can never tell for sure whether 2+3, 1+4, -13+18, etc. were being added up). Thus, it's 'oneway', as I understand the term. But it's more than trivial to come up with many, many pairs of input values adding up to the same sum. That's essentially what I thought is called 'finding a collision' -- leaving it entirely open whether this can generally be done for some given hash (sum), or only for some prior unknown hash... Anyhow, as open as my definitions currently are, just as open am I to adjust them as required :) I guess we're basically all having the same concepts in the back of our minds. We're just using different terminology (or the same terminology in different ways) to talk about them -- which is always an unfailing recipe for causing confusion... Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Tue, Aug 24, 2004 at 09:18:46PM +0200, Danny De Cock wrote: > On Tue, 24 Aug 2004, Almut Behrens wrote: > > >On Tue, Aug 24, 2004 at 12:44:53PM +0200, Danny De Cock wrote: > >> > >>(...) but the verification of password hashes, such as used in pam, > >>rely on the hash function's oneway-feature rather than on its > >>collision-resistance... > > > >not sure I understand -- so, if someone would like to explain this > >aspect to the non-cryptographist, please go ahead :) > > a cryptographic hash function, such as md5, sha1, ripemd-160, to name the > most commonly used cryptographic hash functions are constructed to have at > least the following properties: > > 1. it is hard to find two distinct inputs to the hash function, say x and > y, so that hash(x) equals hash(y) > > 2. they are one-way, i.e., it is hard to find the value x given hash(x) just to make sure we're using the same terminology: 1. is what I'd consider collision resistance, whereas the oneway aspect (2.) refers to the difficulty of retrieving the original string (x above) used in computing the hash in the first place. > for password schemes, it is important that the hash function used is > one-way: if one knows the password, it must be very simple/easy to compute > the hash of that password, but if someone obtained the hash of a password, > it must be very difficult to find something, say z, so that hash(z) equals > the hash of the password. but that's property 1 then (i.e. collision resistance), isn't it? And that's essentially what I was trying to point out, as I don't think that, WRT password verification, you'll ever need to know the original x. It's completely sufficient to find some other password y, z, or whatever, such that hash(some_password) == stored_hash where the stored/given hash has originally been computed as hash(x). Thus, I'd still say it's not the oneway aspect that matters here, but rather the collision resistance of the hash function... Of course, as Mike has already pointed out, it's a completely different story whether you can find _any_ collision (for an arbitray hash value), or a collision for some _given_ cryptographic hash value. Otherwise, I hardly have any objections to what you wrote :) > > does this clarify things a bit more? :)) not so sure... :) -- i.e. I don't really see a huge conceptual difference between two 'passwords' or 'documents' hashing to the same value... Also, here again, as I tried to point out in my previous post, I'd say that with finding passwords, you have more degrees of freedom. All that matters is that their hashes are identical, when you want to get access -- the string itself is totally irrelevant. While with signing documents, you'd probably have some very specific message in mind (at least not some random string) that you'd like to fake as originating from someone else. Thanks, Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
Re: MD5 collisions found - alternative?
On Tue, Aug 24, 2004 at 12:44:53PM +0200, Danny De Cock wrote: > > (...) but the verification > of password hashes, such as used in pam, rely on the hash function's > oneway-feature rather than on its collision-resistance... not sure I understand -- so, if someone would like to explain this aspect to the non-cryptographist, please go ahead :) I always thought that the oneway-feature is not particularly relevant when verifying passwords... In other words, if you can find (within a reasonable amount of time) any input string that produces the same given digest, then any password verification system will let you in, independently of whether you ever get to know the original password... In that case, I believe, you'd even have less constraints to satisfy, than when trying to find a collision for a message digest used to verify the integrity of some executable, for example (to protect against trojans, etc.). In the latter case, a large portion of the input would have to be identical (so the trojan will essentially do the same thing as the real program). This means you're only left with some smaller fraction of the binary to fiddle with in an attempt to yield the same message digest for both program versions (compensating for the modifications you made to the code). I'd suspect that the reduced degrees of freedom in the latter case would make it harder to find an appropriate collision. Or am I completely on the wrong track? Just wondering... Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]