On 07/18/2014 10:45 PM, Charlie Murphy wrote:
> 
> Interesting.  How could a header change the compression so much?
> 

http://en.wiktionary.org/wiki/Shannon_entropy

Ideal compression is based on a known PMF. Actual compression is based
on a heuristic PMF. A header changes that heuristic PMF, sometimes quite
dramatically. Particularly if it uses a different alphabet than the data
does.

(As a thought experiment, imagine if you knew beforehand that a file
*either* contained "Four score and seven years ago" or "Now is the
winter of our discontent". And no other options. How many bits would it
take to encode that file?)

WMG

Reply via email to