Re: Hash database

2005-04-09 Thread Almut Behrens
On Sat, Apr 09, 2005 at 04:16:27PM +0200, Raffaele D'Elia wrote:
> Unfortunatly not. I want to verify each file installed using .deb's  
> against the md5sum written inside the .deb itself.
> Debsum does this storing the hashes locally. I want the same control 
> over a central db, independent from the machine I'm running debsums on.

You could extract the checksums from a set of existing .deb packages
with a simple script (example below), then put the generated file
containing all md5sums onto some shared, read-only location, and verify
whatever machine's installed files against this checksum "database".

Here's a sample script for you to get started:

#!/bin/sh

srcdir=/var/cache/apt/archives
tmpdir=/tmp/md5sums/DEBIAN
targetfile=./md5sums.all

rm -f $targetfile

for deb in $srcdir/*.deb ; do
  rm -rf $tmpdir
  mkdir -p $tmpdir
  dpkg-deb -e $deb $tmpdir
  if [ -f $tmpdir/md5sums ] ; then
cat $tmpdir/md5sums >>$targetfile
  else
echo No md5sums for $deb!
  fi
done

After adjusting paths and running this, you should have a file
md5sums.all. To verify against it, cd to / (all filenames associated
with the checksums are relative to /), and run md5sum:

# cd / ; md5sum -c /path/to/md5sums.all

For every mismatch, you'll get "md5sum: MD5 check failed for ...".

In case there are packages missing md5sums files (does happen), you
can generate them with "debsums --generate", and grab them from
/var/lib/dpkg/info/*.md5sums (IIRC).

Is that closer to what you want? :)

Almut


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: Hash database

2005-04-09 Thread Almut Behrens
On Sat, Apr 09, 2005 at 03:04:31PM +0200, Raffaele D'Elia wrote:
> I'm looking for hash of installed files.
> 
> I already know debsums, but I need a something undependent from local 
> hash database...
> 
> Some ideas? I know also about tripwire, but tripwire create his 
> reference db from the system itself, not from an unwriteable media.

Not exactly sure whether I understand what you need, but the most basic
(and thus most flexible) way would be something like:

Generate the list of checksums:

$ find . -type f | xargs md5sum > chksums

Some time later, verify them:

$ md5sum -c chksums

Source (here '.', i.e. $PWD), and destination (chksums) can essentially
be any path -- only chksums needs to be writable, of course.

Before generating the hash list, think about how precisely you want the
paths to end up in the file chksums, so that they'll be found when you
later try to verify the files.
This depends on what start-path you give to 'find' to recurse from, e.g.
'find .'will give you entries like ./some/relative/path, while
'find $PWD' will give you /the/absolute/path/to/the/file.

Also, if you can't get it the way you need it right from the start,
there's always sed/perl/whatever to fixup the path prefix to what it
needs to be for md5sum to locate the files when verifying hashes.

Is that roughly what you need?

Almut


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-25 Thread Almut Behrens
On Wed, Aug 25, 2004 at 01:15:13AM -0400, Hubert Chan wrote:
> 
> Ah, but then using that definition of "oneway", every hash is oneway,
> since there must always be some hash value corresponding to two
> different input strings (assuming the input space is larger than the
> output space, which is generally the case).  Since every hash is oneway,
> this renders the term meaningless.  So the only useful notion of oneway
> is that the hash is not easily invertible (i.e. you can't easily find
> some string that produces a given hash value).

Okay, I guess I finally got it.  Thanks for the clarifications.
Let me just rephrase it in my own words to make sure my updated
understanding now matches the notion commonly held in cryptography
circles.  No need to respond unless you still find some flaws in it :)

So, if you can somehow come up with an input string (except by brute
force search), which computes to some given hash, that means you
inverted the function, and it's thus not oneway -- nothing more and
nothing less.  It has nothing to do with whether there exists some
theoretic backward mapping from output to input that would uniquely
retrieve the string originally used to compute the hash.

The crucial point here simply was my rather different conception of
invertability.  So, now, the addition operation I mentioned is clearly
_not_ oneway, in contrast to what I proclaimed originally ;)

Makes sense now -- and makes much of what's been said so far appear in
a different light. (And it hopefully explains some of the objections I
had, that presumably must have seemed a little weird to anyone with a
'cryptographic' mindset...)

Thanks again everyone for taking the time.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-25 Thread Almut Behrens
On Wed, Aug 25, 2004 at 10:01:25AM +0100, Dale Amon wrote:
> 
> Hashing (think of cornbeef hash with things all chopped up into bits)
> is a technique to generate fast lookup keys. The discussion here is
> about cryptographic hash functions.

...and I think somewhere in between lie hashing functions like crc32,
as used for detecting transmission errors, for example.  Those are
not cryptographic, but possess a sufficiently large output space, so we
can expect few random collisions for most practical purposes.


> I believe the primary difference is that a cryptographic hash is:
> 
>   * a uniform mapping. For input space n and output space m,
> there are on average n/m strings with the same hash key.

Right, but I believe that a uniform mapping _also_ is a desirable
property (besides speed, of course) of hashing functions as used to
compute table lookup indices -- as this property assures that the data
storage locations will be spread as evenly as possible across the
available buckets, which in turn minimizes the time spent on resolving
collisions (on average).  And, as practical considerations in this case
always enforce a rather small output space (i.e. number of buckets)
we're certainly expecting collisions here. [1]

>   * randomness. Input strings which differ by 1 bit in any
> position generate hash keys a random distance apart

I'd add:
  * huge size of the output space (with its upper limit corresponding
to the number of bits of the hash value).  The probability of
accidentally finding a collision is of course directly related to
the size of the output space (assuming a uniform mapping).

I think those are the basic properties that allow for the two desirable
features 'collision resistance' and 'onewayness', which we've been
discussing in this thread -- but I guess I'm not really telling you
anything new... ;)
If anyone knows of any other requirements, please feel free to chime
in... Well, OTOH, this would probably be getting a little off-topic for
debian-security (especially the debian aspect).

Almut


[1] for anyone who doesn't know it already, I'd recommend Bob Jenkins
page on hashing: http://burtleburtle.net/bob/hash/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-24 Thread Almut Behrens
On Tue, Aug 24, 2004 at 11:01:58PM +0200, Moritz Schulte wrote:
> 
> (...)  But if your hash function is pretty good in respect to
> collision-resistance but is not one-way (being similar to a 1:1
> mapping between hash input and hash output), you could simply apply
> the inverse function to your hash output and are already done.

If that was possible for md5, it would be an ingenius compression
algorithm, as you could sqeeze several hundred Megs or more into 128
bits, and still be able to retrieve the original data... ;)

Somewhat more seriously: are there generally any defining criteria for
something one would call a 'hash function', saying that it always must
map some larger input space to some smaller output space?

I'm thinking of something like the following:  a trivial, reversible
1:1 mapping would be to simply rotate every ASCII value in a string by
some N (e.g. 1->2, 2->3, ..., 255->1).  That procedure would fit the
above mentioned properties, in that it's perfectly reversible, and also
pretty collision-resistant -- at least, from the top of my head, I
couldn't think of any reason why there should be any two inputs mapping
to the same output.  But I don't think that that'd be considered a hash
function...
(BTW, I'm not making any claims whatsoever about its usefulness in the
context of computing checksums, so please don't get me wrong there.)

Anyway, it's 6 a.m. here, and I got to get some sleep now... so, I
won't pester you any further :)

Thanks everyone for the input!

Almut


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-24 Thread Almut Behrens
On Wed, Aug 25, 2004 at 12:39:57AM +0200, Rolf Kutz wrote:
> 
> If you can calculate a collision from the hash and
> the known password, that would be a lack off
> collision resistance.

Is knowing the password really a prerequisite?  I'd have said that if
you can find a collision at all, or calculate a collision for a given
hash, that would be lack of collision resistance...

> 
> The difference between a hash for a signature and
> a hash for a password is that you know the plain
> text in the first case.

Sure. But does it really make such a difference for finding a
collision, if you know the plaintext, rather than its hash?  The
latter can always be computed trivially in the usual forward fashion.
(Just asking out of curiosity, not to argue in any direction
whatsoever...)


> > documents, you'd probably have some very specific message in mind (at
> > least not some random string) that you'd like to fake as originating
> > from someone else.
> 
> This depends on how the attack really works. If
> you just need to flip a few bits in a document it
> might just look like typos (think crc32). If your
> document is a tarball or a .deb you might be able
> to insert a lot of "garbage" to it without being
> noticed.

I agree, in general.  OTOH, if you have something like a tar.gz file,
I'd presume it's rather challenging to make some change to the content
being packaged in such a way that both tar.gz's still have the same
md5, same size -- and unpack without error.  And even more challenging,
if that change is supposed to achieve a certain predefined effect... ;)

Almut


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-24 Thread Almut Behrens
On Tue, Aug 24, 2004 at 11:09:39PM +0200, Danny De Cock wrote:
> >>for password schemes, it is important that the hash function used is 
> >>one-way: if one knows the password, it must be very simple/easy to 
> >>compute the hash of that password, but if someone obtained the hash of 
> >>a password, it must be very difficult to find something, say z, so that 
> >>hash(z) equals the hash of the password.
> >
> >but that's property 1 then (i.e. collision resistance), isn't it?
> 
> not exactly the same, but it is true that the two properties are somehow 
> related: if you can find a z so that hash(z) equals hash(x), and x and z 
> are different, then you have found a collision, if not, you have inverted 
> the hash function.

I totally agree so far.

> being able to invert a hash function clearly means that the function is 
> not collision-resistant,

does it?
(presuming that retrieving that x from hash(x) is not considered
'finding a collision' -- there might not necessarily exist another
input value with the same hash, after all)

On a related note, as hash functions typically are many-to-one type of
mappings, how can they ever be inverted anyway?

> 
> summarizing:
>  an attack relying on the collision-resistance takes as
>   input two values x and y and results in an
>   output hash(x) which equals hash(y), but
>  an attack on the one-way property of the hash function takes as
>   input hash(x) and produces an
>   output z so that hash(z) equals hash(x).
> 
> this is quite different, isn't it? :)

hmm, well, I guess that's where our notions/terminology start to differ
slightly...
The main difference I see there is in the intent of the attacks, not
so much in the property they're trying a attack (or the lack of a
property the attack tries to exploit).  The first type aims at
finding some y for some given x, so that their hashes do not differ,
but it's irrelevant what the specific hash value is. The second type
starts from some given hash(x), and tries to find some z that yields
the same hash (as in the password cracking case).  But both types of
attacks are based on the existence of collisions.  Only trying to find
that very x that originally produced hash(x) would be an attack on the
one-way property, IMHO.

I don't quite see why the second attack must have anything to do with
the function being one-way or not: even if the function was perfectly
one-way, but not collision-resistant, you might still find that z
(z is not necessarily the inverse of hash(x) after all -- at least not
as I understand the term... :)

Let me just use a trivial example - the simple addition operation - to
elaborate on my notions of 'onewayness' and 'collision resistance':
When we add 2+3, we get 5.  Great. :)  That sum kind of represents the
'hash' or checksum. This operation is clearly not reversible (i.e. when
only being given the 5, you can never tell for sure whether 2+3, 1+4,
-13+18, etc. were being added up). Thus, it's 'oneway', as I understand
the term.  But it's more than trivial to come up with many, many pairs
of input values adding up to the same sum.  That's essentially what I
thought is called 'finding a collision' -- leaving it entirely open
whether this can generally be done for some given hash (sum), or only
for some prior unknown hash...

Anyhow, as open as my definitions currently are, just as open am I to
adjust them as required :)

I guess we're basically all having the same concepts in the back of our
minds.  We're just using different terminology (or the same terminology
in different ways) to talk about them -- which is always an unfailing
recipe for causing confusion...

Almut



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-24 Thread Almut Behrens
On Tue, Aug 24, 2004 at 09:18:46PM +0200, Danny De Cock wrote:
> On Tue, 24 Aug 2004, Almut Behrens wrote:
> 
> >On Tue, Aug 24, 2004 at 12:44:53PM +0200, Danny De Cock wrote:
> >>
> >>(...) but the verification of password hashes, such as used in pam, 
> >>rely on the hash function's oneway-feature rather than on its 
> >>collision-resistance...
> >
> >not sure I understand -- so, if someone would like to explain this 
> >aspect to the non-cryptographist, please go ahead :)
> 
> a cryptographic hash function, such as md5, sha1, ripemd-160, to name the 
> most commonly used cryptographic hash functions are constructed to have at 
> least the following properties:
> 
> 1. it is hard to find two distinct inputs to the hash function, say x and 
> y, so that hash(x) equals hash(y)
> 
> 2. they are one-way, i.e., it is hard to find the value x given hash(x)

just to make sure we're using the same terminology:  1. is what I'd
consider collision resistance, whereas the oneway aspect (2.) refers to
the difficulty of retrieving the original string (x above) used in
computing the hash in the first place.


> for password schemes, it is important that the hash function used is 
> one-way: if one knows the password, it must be very simple/easy to compute 
> the hash of that password, but if someone obtained the hash of a password, 
> it must be very difficult to find something, say z, so that hash(z) equals 
> the hash of the password.

but that's property 1 then (i.e. collision resistance), isn't it?
And that's essentially what I was trying to point out, as I don't think
that, WRT password verification, you'll ever need to know the original x.
It's completely sufficient to find some other password y, z, or
whatever, such that

  hash(some_password) == stored_hash

where the stored/given hash has originally been computed as hash(x).

Thus, I'd still say it's not the oneway aspect that matters here, but
rather the collision resistance of the hash function...

Of course, as Mike has already pointed out, it's a completely different
story whether you can find _any_ collision (for an arbitray hash
value), or a collision for some _given_ cryptographic hash value.

Otherwise, I hardly have any objections to what you wrote :)

> 
> does this clarify things a bit more? :))

not so sure... :)  -- i.e. I don't really see a huge conceptual
difference between two 'passwords' or 'documents' hashing to the same
value...

Also, here again, as I tried to point out in my previous post, I'd say
that with finding passwords, you have more degrees of freedom.  All
that matters is that their hashes are identical, when you want to get
access -- the string itself is totally irrelevant.  While with signing
documents, you'd probably have some very specific message in mind (at
least not some random string) that you'd like to fake as originating
from someone else.

Thanks,
Almut


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]



Re: MD5 collisions found - alternative?

2004-08-24 Thread Almut Behrens
On Tue, Aug 24, 2004 at 12:44:53PM +0200, Danny De Cock wrote:
>
> (...) but the verification
> of password hashes, such as used in pam, rely on the hash function's
> oneway-feature rather than on its collision-resistance...

not sure I understand -- so, if someone would like to explain this
aspect to the non-cryptographist, please go ahead :)

I always thought that the oneway-feature is not particularly relevant
when verifying passwords...  In other words, if you can find (within a
reasonable amount of time) any input string that produces the same
given digest, then any password verification system will let you in,
independently of whether you ever get to know the original password...

In that case, I believe, you'd even have less constraints to satisfy,
than when trying to find a collision for a message digest used to
verify the integrity of some executable, for example (to protect
against trojans, etc.).  In the latter case, a large portion of the
input would have to be identical (so the trojan will essentially do the
same thing as the real program). This means you're only left with some
smaller fraction of the binary to fiddle with in an attempt to yield
the same message digest for both program versions (compensating for the
modifications you made to the code).

I'd suspect that the reduced degrees of freedom in the latter case
would make it harder to find an appropriate collision.  Or am I
completely on the wrong track?  Just wondering...

Almut


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]