> > Almost all 'real users' just use Codec.Compression.GZip. It's very > > fast, very compositional, and (perhaps suprisingly) almost as effective > > as application-specific schemes. > > I was about to say the same thing. So so much simpler to use Duncan's > carefully written zlib binding, > > import Data.Binary > import Codec.Compression.GZip > import qualified Data.ByteString.Lazy as L > > main = L.writeFile "log.gz" . compress . encode $ [1..10::Int] > > Simple, purely functional, fast. > > -- Don
I have several types of symbols, and for each type the probabilities are very predictable - to the point where they could even be hard-coded. And upon completion I can be sure the first two questions will be "Can we make it smaller?" and "Can we make it faster?". GZip (while very cool) is adaptive and general-purpose, so it's building frequency tables as it goes and ignoring the structure of the data I should be able to take advantage of. With an awful lot of trouble, it must be possible to write something in C to go faster and yield better compression than gzip for this particular data. With the probability structure known in advance, there are just a lot of steps taken by gzip that are no longer needed. Besides this, gzip only assumes an arbitrary sequence of bytes, but my data are much more structured than this. Considering the high performance achieved using idiomatic Haskell and the ByteString and Binary libraries, I would think a similar approach could be used for writing individual bits. Then it would be (relatively) easy to write custom compression routines in Haskell with reasonable performance - I don't think this can be said of any other language. Chad _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe