On 25/08/17 04:21, Thomas Schmitt wrote: > One can estimate entropy by an approximation of the best possible > compression in the context of the knowledge of the reader. > The compression result will generally be longer if the compressor has > fewer knowledge about the message.
In principle, yes, but in practice, not at all. File compressors are designed assuming that the common case will be compression of data much longer than passwords (at least 1 MB). The behavior for message less than 100 B long will be highly anomalous. Moreover, the meta-data (like magic number, container, et cetera) add overhead to the compressed file. If we interpret compressed length as entropy this will inflate your estimate of entropy by tens of bytes, which is enough to make it useless. The problem trying to estimate entropy of a message M' given a prior message M (the _context_ in your wording) can be formulated mathematically in terms of Kolmogorov complexity. Unfortunately, determining “the” Kolmogorov complexity of a message (given an universal encoding scheme, for example, programs in untyped λ-calculus) is algorithmically undecidable. Worse yet, Chaitin proved a theorem (now called Chaitin incompleteness theorem) that for any consistent formal system there exist a bound N such that the formal system can not prove that “the” Kolmogorov complexity of any specific string is higher than N. > The second password class and my knowledge about it gives me not more > than a reduction of text bit number by 25 percent (6 bit text -> 8 bit > binary) and a couple of bits which are harder to harvest. > E.g. i know that a dictionary attack is of few use. That's one bit, > because it's the first decision i can make. Any further insight might add > only a fraction of a bit. (It's probabilistic. So we can grind bits to dust.) This is a somewhat oversimplified analysis. You know beforehand that a password is almost surely a sequence of printable characters among the allocated code points in Unicode. If you know the program in which the password has been input, then you can know the character encoding as well. Assuming it is UTF-8, you can discard a large fraction of all possible 8-bit strings (not all 8-bit strings are valid UTF-8). Thus the prior distribution has less than 8 bits of entropy per bit. -- Do not eat animals, respect them as you respect people. https://duckduckgo.com/?q=how+to+(become+OR+eat)+vegan
signature.asc
Description: OpenPGP digital signature