List: The recent thread on "AES-128 keys unique for fixed plaintext/ciphertext pair" included a discussion on unicity, with some broken dialogues. I wrote -up a summary that I'm sending to this list as a possible seed for further comments. I apologize for any mistakes or imprecision, as I'm not trying to be as exact as possible -- just sufficiently exact for the purpose at hand. I also provide below the online references for Shannon's works [Sha48, Sha49] that are important to this discussion.
The AES thread discussion is NOT included here. 1. WHAT IS UNICITY? There are three different contexts to answer to this question! 1.a. Unicity Definition: Shannon [Sha49, page 693] defined "unicity distance" (hereafter, "n") as the least amount of plaintext which can be uniquely deciphered from the corresponding ciphertext, allowing one to determine, without doubt, the key that was used for encryption. The "amount" of plaintext (i.e., "n") can be measured in any units the user may find convenient, such as bits, bytes, letters, symbols, etc. Actually, Shannon used "letters" in his paper. NOTE 1: This is a definition. There is no proof involved here. 1.b. Unicity Model: As first given by Shannon [Sha49] under some restrictive assumptions, specially the "random cipher" assumption, the mathematical expression for unicity can be cast in the following unfolded expression (his original expression was n = H(K)/D, where D is the redundancy): n = H(K)/[|M| - H(M)] where the quantities are: n = unicity; least message length that can be uniquely deciphered H(K) = entropy of keys used in encryption |M| = maximum possible entropy for the plaintext H(M) = entropy of actual message, the plaintext and the entropies are calculated accordingly to the desired units (bits, bytes, letters, symbols, etc.), which also define the unit for n. NOTE 1: The model for unicity has no probability error with a tail to infinity because only entropy values are used in the formula of n and by *definition* of entropy the entropy is already a limit to infinity. NOTE 2: It does not matter how the attacker may try to decipher the message. The attacker can of course use brute-force and try out all keys or he can use short-cuts, it is his choice and he is entirely free to use any method he desires. The work involved may be small, quite large or even unbounded -- the amount of work is actually unspecified. NOTE 3: Shannon's definition of "random cipher" was that "all decipherments must produce a random flat distribution over all bits in the plaintext space." 1.c. Unicity Value: The numerical value of n. It is important not to confuse a model with a measurement. Models predict measurements, and do so within an error range. What is the the error range for measuring n? First, note that the model works for any ciphertext, any plaintext. And for any such pairs, the result "n" is predicted by the model even if an attacker has unbounded resources, including infinite time. The value of "n" depends on the maximum possible entropy for the plaintext, the plaintext entropy, the entropy of the keys and the assumption that the cipher is a random cipher. Since all good ciphers should be a random cipher, for those ciphers the model provides a good approximation to what "n" actually is. The practical difficulty of reliably estimating the plaintext entropy and even the key entropy (which errors contribute to an error in "n") has nothing to do with the model itself or its error for "n", but on the errors for the quantities on which it depends -- however, it's not so hard to obtain good estimates and several are well-known. NOTE 1: Estimating the entropy of English (and other languages) has been the subject of considerable study. Various authors have measured H(M) for English texts and found values that lie between 1.0 and 1.5. The standard value quoted is 1.2, close to average of the extreme values. Even though each author has a different text, different preferred words, and different style preferences, we all come pretty close to the entropy value of 1.2. However, XML text (which is in English) is more redundant than natural English and should have a lower entropy. On the other hand, English text that is sent by SMS in cell phones has messages such as "Chk tat 4 u 2", where the redundancy is reduced and the entropy should be higher. NOTE 2: The benefit of compression is to increase unicity even if the compression algorithm is fully known to the attacker. If the plaintext is compressed before encipherment, then we rightly expect its entropy per compressed character to increase -- even though its entropy per English character does not increase. This is often confusing and may provide the wrong impressions that nothing is gained by compression or that we may need to "hide" the compression algorithm from the attacker. 2. READING THE FINE PRINT Of further importance and often ignored or even contradicted by some statements in the literature such as "any cipher can be attacked by exhaustively trying all possible keys", I usually like to call attention to the fact that any cipher (including 56-bit-key DES) can be theoretically secure against any attacker -- even an attacker with unbounded resources -- when the cipher is used within its unicity. Not only the One-Time Pad is theoretically secure, but any cipher can be theoretically secure if used within the unicity distance. Thus, indeed there is a theoretically secure defense even against brute-force attacks, which is to work within the unicity limit of the cipher. And, it works for any cipher that is a good random cipher -- irrespective of key-length or encryption method used. It is also important to note, as the literature has also not been very neat in this regard, that unicity is always referred to the plaintext. However, it may also be applied to indicate the least amount of ciphertext which needs to be intercepted in order to attack the cipher -- within the ciphertext/plaintext granularity. For example, for a simple OTP-cipher, being sloppy works because one byte of ciphertext links back to one byte of plaintext -- so, a unicity of n bytes implies n bytes of ciphertext. For DES, however, the ciphertext must be considered in blocks of 8 bytes -- so, a unicity of n bytes implies a corresponding modular number of 8 bytes. 3. ONLINE REFERENCES [Sha48] Shannon, C. Communication Theory of Secrecy Systems. Bell Syst. Tech. J., vol. 28, pp. 656-715, 1949. See also http://www3.edgenet.net/dcowley/docs.html for readable scanned images of the complete original paper and Shannon's definition of "unicity distance" in page 693. Arnold called my attention to a typeset version of the paper at http://www.cs.ucla.edu/~jkong/research/security/shannon.html. [Sha49] Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J., vol. 27, pp. 379-423, July 1948. See also http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html Anton also made available the following link, with notes he took for Claude Crepeau's crypto course at McGill. See page 24 and following at http://crypto.cs.mcgill.ca/~stiglic/Papers/crypto1.ps (Anton notes that it's not unlikely that there are errors in those notes). Comments are welcome. Cheers, Ed Gerck --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]