On 3 November 2013 03:17, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote: > >> jonas.thornv...@gmail.com wrote: >>> >>>Well then i have news for you. >> >> Well, then, why don't you share it? >> >> Let me try to get you to understand WHY what you say is impossible. > [snip reasons] > > But your second reason, better known as the pigeonhole principle, > demonstrates that for any lossless compression method, there must be data > sets that actually expand the data. It doesn't matter how cleverly you > compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to > speak. Suppose your compression algorithm compresses a single byte into a > nybble (four bits). There are 256 different input data sets (0x00, > 0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There > is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or > more pigeons in at least one hole. Ergo, if the compression algorithm is > lossless, *some* data must be expanded rather than compressed.
You have to be careful to make this totally rigorous, too. Note that I *can* make a "compression" algorithm that takes any length-n sequence and compresses all but one sequence by at least one bit, and does not ever expand the data. "00" -> "" "01" -> "0" "10" -> "1" "11" -> "00" This, obviously, is just 'cause the length is an extra piece of data, but sometimes you have to store that anyway ;). So if I have a list of N length-Y lists containing only 1s or 0s, I can genuinely compress the whole structure by N log2 Y items. -- https://mail.python.org/mailman/listinfo/python-list