danceswithnumb...@gmail.com wrote:

Compress this:

4135124325

Bin to dec...still very large
11110110
01111000
11111101
01100101

Wait right there! You're cheating by dropping off leading
0 bits. The maximum value of a 10 digit decimal number is
9999999999, which in hex is 2540be3ff. That's 34 bits.
That's in line with the theoretical number of bits needed:

log2(10) * 10 = 33.219

So the binary version of your number above is really

        00
  11110110
  01111000
  11111101
  01100101

You may think you can get away without storing or
transmitting those leading 0 bits, because the decoder
can always pad out the data as needed.

But to do that, the decoder needs to know *how many*
bits to pad out to. That information somehow needs to
be part of the encoded data.

You need to imagine you're sending the data to someone
over a wire. The only thing you can send along the wire
are ones and zeroes. You can't convey any information
by timing, or turning off the power, or anything like
that. How is the receiver going to know when he's got
the whole message?

There are only two possibilities. Either you decide
in advance that all messages will be exactly the same
length, which in this case means always sending
exactly 34 bits. Or you add some extra bits to the
message -- prepend the length in binary, or add an
end-of-message code at the end, or something like
that.

Whatever you do, you'll find that *on average* you
will need *at least* 34 bits to be able to represent
all possible 10-digit decimal numbers. Some might
be shorter, but then others will be longer, and
the average won't be less than 34.

New compression method:

11000101
11000111
01001111

A full byte less than bin.

You need to be *very* careful about what you're claiming here.
Are you saying that your algorithm compresses *all possible*
sequences of 10 decimal digits to 3 bytes or less? Or can
some of them come out longer?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to