On Fri, Nov 8, 2013 at 1:43 PM, R. Michael Weylandt <michael.weyla...@gmail.com> <michael.weyla...@gmail.com> wrote: > Chris's point is more subtle: the typical computer will store the number > 65536 in a single byte, but it will also store 4 and 8 in one byte. So if > your choice is between sending "65536" and "(4,8)", you actually loose > efficiency in your scheme. Don't think in decimal, but in terms of > information needing transfer.
Well, 65536 won't fit in a single byte, nor even in two (only just). A typical binary representation of 65536 would take 3 bytes, or 4 for a common integer type: 0x00 0x01 0x00 0x00 (big-endian). So it would be possible to represent that more compactly, if you deem one byte each for the base and the exponent: 0x04 0x08. However, that system allows you to represent just 62,041 possible numbers: >>> decomp={} >>> for base in range(256): for exp in range(256): decomp[base**exp]=base,exp >>> len(decomp) 62041 The trouble is, these numbers range all the way up from 0**0 == 0 to 255**255 == uhh... >>> 255**255 46531388344983681457769984555620005635274427815488751368772861643065273360461098097690597702647394229975161523887729348709679192202790820272357752329882392140552515610822058736740145045150003072264722464746837070302159356661765043244993104360887623976285955058200326531849137668562738184397385361179287309286327712528995820702180594566008294593820621769951491324907014215176509758404760451335847252744697820515292329680698271481385779516652518207263143889034764775414387732372812840456880885163361037485452406176311868267428358492408075197688911053603714883403374930891951109790394269793978310190141201019287109375 which would decompress to (obviously) 255 bytes. So you can represent just over sixty thousand possible 255-byte strings in two bytes with this system. To generalize it, you'd need to devote a bit somewhere to saying "There are more to add in". Let's say you do this on the exponent byte (high bit for convenience), so you have 0x04 0x08 means 65536, and 0x04 0x88 0x01 0x01 means 65536+1 = 65537. Now we have something that generalizes; but the efficiency is gone - and there are too many ways to encode the same value. (Bear in mind, for instance, that 0x01 0xNN for any NN will still just represent 1, and 0x00 0xNN will represent 0. That's wasting a lot of bits.) The application I can see for this sort of thing is not data compression, but puzzles. There are lots of puzzles that humans find enjoyable that represent an entire grid with less data than it contains - for one popular example, look at Sudoku. You have a 9x9 grid where each cell could contain any of nine digits, which means a theoretical information density of 9**81; but the entire grid can be described by a handful of digits and heaps of blank space. This could be a similarly-fun mathematical puzzle: 3359232 can be represented as B1**E1 + B2**E2, where all numbers are single-digit. Find B1, B2, E1, and E2. In this case, you're guaranteed that the end result is shorter (four digits), but it's hardly useful for general work. ChrisA -- https://mail.python.org/mailman/listinfo/python-list