On 07/18/2014 10:45 PM, Charlie Murphy wrote: > > Interesting. How could a header change the compression so much? >
http://en.wiktionary.org/wiki/Shannon_entropy Ideal compression is based on a known PMF. Actual compression is based on a heuristic PMF. A header changes that heuristic PMF, sometimes quite dramatically. Particularly if it uses a different alphabet than the data does. (As a thought experiment, imagine if you knew beforehand that a file *either* contained "Four score and seven years ago" or "Now is the winter of our discontent". And no other options. How many bits would it take to encode that file?) WMG