On Fri, Oct 31, 2014 at 2:55 PM, David Haller <gen...@dhaller.de> wrote: > > On Fri, 31 Oct 2014, Rich Freeman wrote: > >>I can't imagine that any tool will do much better than something like >>lzo, gzip, xz, etc. You'll definitely benefit from compression though >>- your text files full of digits are encoding 3.3 bits of information >>in an 8-bit ascii character and even if the order of digits in pi can >>be treated as purely random just about any compression algorithm is >>going to get pretty close to that 3.3 bits per digit figure. > > Good estimate: > > $ calc '101000/(8/3.3)' > 41662.5 > and I get from (lzip) > $ calc 44543*8/101000 > 3.528... (bits/digit) > to zip: > $ calc 49696*8/101000 > ~3.93 (bits/digit)
Actually, I'm surprised how far off of this the various methods are. I was expecting SOME overhead, but not this much. A fairly quick algorithm would be to encode every possible set of 96 digits into a 40 byte code (that is just a straight decimal-binary conversion). Then read a "word" at a time and translate it. This will only waste 0.011 bits per digit. -- Rich