[Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases
I'm pleased to announce updates to the zlib and bzlib packages. The releases are on Hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/zlib http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bzlib What's new == What's new in these releases is that the extended API is slightly nicer. The simple API that most packages use is unchanged. In particular, these functions have different types: compressWith :: CompressParams -> ByteString -> ByteString decompressWith :: DecompressParams -> ByteString -> ByteString The CompressParams and DecompressParams types are records of compression/decompression parameters. The functions are used like so: compressWith defaultCompressParams { ... } decompressWith defaultDecompressParams { ... } There is also a new parameter to control the size of the first output buffer. This lets applications save memory when they happen to have a good estimate of the output size (some apps like darcs know this exactly). By getting a good estimate and (de)compressing into a single-chunk lazy bytestring this lets apps convert to a strict bytestring with no extra copying cost. Future directions = The simple API is very unlikely to change. The current error handling for decompression is not ideal. It just throws exceptions for failures like bad format or unexpected end of stream. This is a tricky area because error streaming behaviour does not mix easily with error handling. On option which I use in the iconv library is to have a data type describe the real error conditions, something like: data DataStream = Chunk Strict.ByteString Checksum DataStream | Error Error -- for some suitable error type | End Checksum With suitable fold functions and functions to convert to a lazy ByteString. Then people who care about error handling and streaming behaviour can use that type directly. For example it should be trivial to convert to an iterator style. People have also asked for a continuation style api to give more control over dynamic behaviour like flushing the compression state (eg in a http server). Unfortunately this does not look easy. The zlib state is mutable and while this can be hidden in a lazy list, it cannot be hidden if we provide access to intermediate continuations. That is because those continuations can be re-run whereas a lazy list evaluates each element at most once (and with suitable internal locking this is even true for SMP). Background == The zlib and bzlib packages provide functions for compression and decompression in the gzip, zlib and bzip2 formats. Both provide pure functions on streams of data represented by lazy ByteStrings: compress, decompress :: ByteString -> ByteString This makes it easy to use either in memory or with disk or network IO. For example a simple gzip compression program is just: > import qualified Data.ByteString.Lazy as ByteString > import qualified Codec.Compression.GZip as GZip > > main = ByteString.interact GZip.compress Or you could lazily read in and decompress .gz file using: > content <- GZip.decompress <$> ByteString.readFile file General === Both packages are bindings to the corresponding C libs, so they depend on those external C libraries (except on Windows where we build a bundled copy of the C lib source code). The compression speed is as you would expect since it's the C lib that is doing all the work. The zlib package is used in cabal-install to work with .tar.gz files. So it has actually been tested on Windows. It works with all versions of ghc since 6.4 (though it requires Cabal-1.2). The darcs repos for the development versions live on code.haskell.org. I'm very happy to get feedback on the API, the documentation or of course any bug reports. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases
Hello Duncan, Sunday, November 2, 2008, 6:46:00 PM, you wrote: > People have also asked for a continuation style api to give more control > over dynamic behaviour like flushing the compression state (eg in a http > server). Unfortunately this does not look easy. can you gove more details on these? may be i can help -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases
On Sun, 2008-11-02 at 19:07 +0300, Bulat Ziganshin wrote: > Hello Duncan, > > Sunday, November 2, 2008, 6:46:00 PM, you wrote: > > > People have also asked for a continuation style api to give more control > > over dynamic behaviour like flushing the compression state (eg in a http > > server). Unfortunately this does not look easy. > > can you gove more details on these? may be i can help For details talk to Johan Tibell <[EMAIL PROTECTED]> Suppose you're trying to work with a strict block IO strategy, like one of these iterator style designs. What kind of api would one want to work with that? The constraint is that for a pure api, the zlib compression state must be used in a single threaded, non-persistent style. Additionally it would be nice to expose the zlib flush feature. This is tricky in a straightforward design because it involves a branching structure of possible operations, and we cannot split the zlib compression state (at least not cheaply). If we could do it persistently we could have something like: data StreamState = OutputAvailable ByteString -- the output buffer StreamState-- next state | InputRequired (ByteString -> StreamState) -- supply input -- or (Flush -> StreamState) -- flush | StreamEnd CheckSum data Flush = FlushEnd | FlushSync | FlushFull initialState :: StreamState But obviously we cannot do this because we have to guarantee the single threaded use of the stream state. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases
On Sun, 2 Nov 2008, Duncan Coutts wrote: The current error handling for decompression is not ideal. It just throws exceptions for failures like bad format or unexpected end of stream. This is a tricky area because error streaming behaviour does not mix easily with error handling. Maybe http://hackage.haskell.org/packages/archive/explicit-exception/0.0.1/doc/html/Control-Monad-Exception-Asynchronous.html can be of help? ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe