On Sat, 02 Nov 2013 14:31:09 -0700, Tim Roberts wrote: > jonas.thornv...@gmail.com wrote: >> >>Well then i have news for you. > > Well, then, why don't you share it? > > Let me try to get you to understand WHY what you say is impossible. [snip reasons]
Expanding on Tim's post... the first scenario Tim gives, applying the compression repeatedly to its own output until you eventually get a single byte, can be overcome if there are data sets that are unchanged by compression. That is, if f() is the compression function: f(f(f(data))) = f(f(data)) == f(data) == data for some values of data. So if you start with some other data, and compress it, then compress it again, and again, eventually you end up with one of these attractors, after which repeated compression doesn't give you any more benefit. [Aside: the attractors aren't necessarily a single point. The attractor could be a cycle of two or more points, f(x) == y and f(y) == x. Or you could even have a "strange attractor", which brings us to chaos theory.] But your second reason, better known as the pigeonhole principle, demonstrates that for any lossless compression method, there must be data sets that actually expand the data. It doesn't matter how cleverly you compress the data, you can't fit 20kg of potatoes in a 10kg bag, so to speak. Suppose your compression algorithm compresses a single byte into a nybble (four bits). There are 256 different input data sets (0x00, 0x01, ... 0xFF) and only 16 different outputs (0x0, 0x1, ... 0xF). There is no way for 256 pigeons to fit in 16 pigeon holes unless you put two or more pigeons in at least one hole. Ergo, if the compression algorithm is lossless, *some* data must be expanded rather than compressed. Alternatively, the compression may be lossy. Or both! Obviously data isn't compressed a byte at a time, that would be silly. But the principle still applies. There is a way to apparently get around these limits: store data externally, perhaps inside the compression application itself. Then, if you just look at the compressed file (the "data.zip" equivalent, although I stress that zip compression is *not* like this), you might think it has shrunk quite a lot. But when you include the hidden data, the compression is not quite so impressive... -- Steven -- https://mail.python.org/mailman/listinfo/python-list