On Thu, Jun 25, 2015 at 6:33 PM, Chris Angelico <ros...@gmail.com> wrote: > On Fri, Jun 26, 2015 at 1:26 AM, Jon Ribbens > <jon+use...@unequivocal.co.uk> wrote: >>> There are only 256 possible values for n, one of which doesn't transform the >>> data at all (ROT-0). If you're thinking of attacking this by pencil and >>> paper, 255 transformations sounds like a lot. For a computer, that's barely >>> harder than a single transformation. >> >> Well, it means you need to send 256 times as much data, which is a >> start. If you're instead using a 256-byte translation table then >> an attack becomes utterly impractical. >> > > Utterly impractical? Maybe, if you attempt a pure brute-force approach > - there are 256! possible translation tables, which is roughly e500 > attempts [1], and at roughly four a microsecond [2] that'd still take > a ridiculously long time. But there are two gigantic optimizations you > could do. Firstly, there are frequency-based attacks, and byte value > duplicates will tell you a lot - classic cryptographic work. And > secondly, you can simply take the first few bytes of a file - let's > say 16, although a lot of files can be recognized in less than that. > Even if there are no duplicate bytes, that'd be a maximum of 16! > translation tables that truly matter, or just 2e13. At the same speed, > that makes about a million seconds of computing time required. Divide > that across a bunch of separate computers (the job is embarrassingly > parallel after all), and you could get that result pretty easily. Cut > the prefix to just 8 bytes and you have a mere 40K encryption keys to > try - so quick that you wouldn't even see it happen. Nope, a simple > substitution cipher is still not secure. Even the famous Enigma > machine was a lot more than just letter-for-letter substitution - a > double letter in the cleartext wouldn't be represented by a double > letter in the result - and once the machine's secrets were figured > out, the day's key could be reassembled fairly readily.
You're making the same mistake that Steven did in misunderstanding the threat model. The goal isn't to prevent the attacker from working out the key for a file that has already been obfuscated. Any real data that might be exposed by a vulnerability in the server is presumed to have already been strongly encrypted by the user. The goal is to prevent the attacker from guessing a key that hasn't even been generated yet, which could be exploited to engineer the obfuscated content into something malicious. There are no frequency-based attacks possible here, because you can't do frequency analysis on the result of a key that hasn't even been generated yet. Assuming that you have no attack on the key generation itself, the best you can do is send a file deobfuscated with a random key and hope that the recipient randomly chooses the same key; the odds of that happening are 1 in 256!. That said, I do see a potential weakness here: if the attacker can create a malicious payload using only a subset of the 256 possible byte values, then the odds of getting a correct key are increased, since multiple keys will work. For an extreme example, if the attacker can manage to craft a malicious payload that uses only the two byte values 32 and 47, then the probability of getting a key that will obfuscate to that is increased to 1 in 256! / 254!, or 1 in 65280. If they distribute 65280 copies of that payload to various recipients, then they can expect that one recipient on average will get the payload in its malicious form. -- https://mail.python.org/mailman/listinfo/python-list