On Tue, 20 Jun 2000, Ben Laurie wrote:

> Matt Blaze wrote:
> > 
> > I should point out that this construction is not designed to obscure the
> > input from the output (especially under differential probing), only
> > to give you m output bits that depend (each in a different way) on
> > the entire input.
> 
> Perhaps I should add that as a requirement. OTOH, assuming H is perfect,
> wouldn't that make this construction resistant? But I assume you are
> reluctant to attempt to prove that.

out of curiosity, 
has anyone come up with a formal definition of security for this problem?
how would we go about proving that a passphrase condenser is "good" ? 

Here's what I can come up with :

in this discussion we have as criteria 
        * every output bit depends on every input bit
        * no correlations between bits
plus now possibly
        * unable to recover input from output
        * non-linear
which we don't seem to be 100% sure are necessary.

A passphrase condenser should be as close to collision-free as possible as
well, shouldn't it? Otherwise there are passphrases P and P'
such that for a passphrase condenser C(), we have C(P) = C(P'). The
adversary can then guess either P or P', apply C(), and now has two
chances instead of one to guess a preimage of the key at hand. What I find
confusing right now is the fact that P and P' are of arbitrary length, and
so it's not clear to me how to treat the case where the original
passphrase P is, say, 100 bits long, and P' is 100,000,000 bits long vs. 
the case where P is 100 bits and P' is 20 bits. It seems to me that
the first case is not much cause for concern, while the second case 
might be. (on the other hand, if the first case is one in which the 
length and construction of P' is "obvious" from the construction of the
passphrase condenser, and the second case is difficult to determine,
this might not be true) (I can see I need to look at hash functions again)

Maybe another way to think of it is in terms of how an adversary would
view the passphrase condenser : 

The assumption here is that the passphrase P isn't a random
string. Does it make sense to talk about a quantity ENT = "amount of
entropy in P" such that an adversary will take time 2^ENT to guess P, with
0 <= ENT <= |P| (equality if P is random) ? what about adversaries that
learn more about the passphrase and so could 'decrease' ENT - is that
relevant? 

If so, then it seems to me that a "good" passphrase condenser C() might be 
one for which an adversary, given ENT, the description of C(), and L =
"the length of C(P)", will force the adversary to do exactly 
min (2^ENT, 2^L) work to obtain C(P). It can't force the adversary to do
more, since it could just guess P or C(P), whichever is shorter. If it
takes less time than that for the adversary to obtain C(P), the adversary
had an advantage over guessing and something is broken. (should this be
changed to allow for a negligible advantage in min (2^ENT, 2^L) ?)
So it seems that a possible "security parameter" might be min (ENT, L). 

The adversary can also make queries to C() on arbitrary input. The number
of queries should be bounded by a polynomial in min (ENT, L).
Probably, the length of the queries should be bounded, too, by another
polynomial in min (ENT,L). This is roughly analogous to a chosen plaintext
attack in public key schemes, but it needs a different name.

I suppose you could allow for an analogue of a "chosen ciphertext attack"
in which the adversary has access to C(f(P)), where f is any function
*other than the identity* chosen by the adversary, and it can try 
polynomially many different functions. This would model the
case of users who use similar passphrases (e.g. same prefix) between
systems.

This is only useful if it gives us some insight into what properties are
and are not required of a passphrase condenser. So far, it seems :

If ENT > L, then C(P) should be uniformly distributed over all L-bit
strings. Otherwise, if C(P) is not uniformly distributed, the adversary
can guess C(P) in fewer than 2^L steps by just taking advantage of that
fact. If ENT < L, then it doesn't matter until the bias becomes so 
great that using it allows the adversary to take < 2^ENT work. 

If ENT > L, and the passphrase may be shared between two different
systems with the same C(), then we need the property that preimages of
C(P) are "difficult" to find. Otherwise, once the adversary has C(P), it
could easily find a preimage and go off to attack the other system.
If L > ENT, it's easier to directly guess the passphrase P. 

I don't see anything else at the moment.

The requirement for a uniform distribution in the case ENT > L makes me
think that some construction based on universal hash functions might be
worth looking at, but I think that fails the no preimages requirement...

In any case, if this stab at a definition is any good, maybe a first step
would be to try to see if a collection of mn random one bit to one bit
functions satisfies it or not. Then move on to David Wagner's proposal
considering H as a random oracle, and then onto constructions with real
hash functions. 

-dmolnar



Reply via email to