On 02/18/2015 02:55 PM, janhein.vanderb...@gmail.com wrote:
Op woensdag 18 februari 2015 17:47:49 UTC+1 schreef Dave Angel:
On 02/18/2015 03:59 AM, janhein.vanderb...@gmail.com wrote:


encoding individual integers optimally without any assumptions about their 
values.


Contradiction in terms.

--
DaveA

Not.
Jan-Hein.


Then you had better define your new word "optimal" to us. I decided to try your algorithm for all the values between 0 and 999999. One million values, and the 7bit encoding[1] beat yours for 950081 of them. The rest were tied. Yours never was shorter.

For a uniform distribution of those particular values, the average number of bytes used by yours was 3.933568 bytes, and by 7bit encoding was 2.983487

For the second and third million, yours are all 4 bytes, while 7bit uses 3. Beyond 2097152, 7bit uses 4 bytes, same as you. Between 16 and 17 million, you average 4.156865, while 7bit is a constant 4.0.

After that, I started spot-checking. I went up to 100 billion, and for none of those I tried did your algorithm take fewer bytes than 7bit.


So how is yours optimal?  Over what range of values?

I'm not necessarily doubting it, just challenging you to provide a data sample that actually shows it. And of course, I'm not claiming that 7bit is in any way optimal. You cannot define optimal without first defining the distribution.

[1] by 7bit, I'm referring to the one apparently used in MIDI encoding, where 7 bits of each byte hold the value, and the 8th bit is zero, except for the last byte, where the 8th bit is one. So 3 bytes can encode 21 bits, or up to 2**21 - 1.

--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to