[Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases

2008-11-02 Thread Duncan Coutts
I'm pleased to announce updates to the zlib and bzlib packages.

The releases are on Hackage:

http://hackage.haskell.org/cgi-bin/hackage-scripts/package/zlib
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/bzlib

What's new
==

What's new in these releases is that the extended API is slightly nicer.
The simple API that most packages use is unchanged.

In particular, these functions have different types:
compressWith   :: CompressParams   -> ByteString -> ByteString
decompressWith :: DecompressParams -> ByteString -> ByteString

The CompressParams and DecompressParams types are records of
compression/decompression parameters. The functions are used like so:

compressWith   defaultCompressParams { ... }
decompressWith defaultDecompressParams { ... }

There is also a new parameter to control the size of the first output
buffer. This lets applications save memory when they happen to have a
good estimate of the output size (some apps like darcs know this
exactly). By getting a good estimate and (de)compressing into a
single-chunk lazy bytestring this lets apps convert to a strict
bytestring with no extra copying cost.

Future directions
=

The simple API is very unlikely to change.

The current error handling for decompression is not ideal. It just
throws exceptions for failures like bad format or unexpected end of
stream. This is a tricky area because error streaming behaviour does not
mix easily with error handling.

On option which I use in the iconv library is to have a data type
describe the real error conditions, something like:

data DataStream = Chunk Strict.ByteString Checksum DataStream
| Error Error -- for some suitable error type
| End Checksum

With suitable fold functions and functions to convert to a lazy
ByteString. Then people who care about error handling and streaming
behaviour can use that type directly. For example it should be trivial
to convert to an iterator style.

People have also asked for a continuation style api to give more control
over dynamic behaviour like flushing the compression state (eg in a http
server). Unfortunately this does not look easy. The zlib state is
mutable and while this can be hidden in a lazy list, it cannot be hidden
if we provide access to intermediate continuations. That is because
those continuations can be re-run whereas a lazy list evaluates each
element at most once (and with suitable internal locking this is even
true for SMP).

Background
==

The zlib and bzlib packages provide functions for compression and
decompression in the gzip, zlib and bzip2 formats. Both provide pure
functions on streams of data represented by lazy ByteStrings:

compress, decompress :: ByteString -> ByteString

This makes it easy to use either in memory or with disk or network IO.
For example a simple gzip compression program is just:

> import qualified Data.ByteString.Lazy as ByteString
> import qualified Codec.Compression.GZip as GZip
>
> main = ByteString.interact GZip.compress

Or you could lazily read in and decompress .gz file using:

> content <- GZip.decompress <$> ByteString.readFile file

General
===

Both packages are bindings to the corresponding C libs, so they depend
on those external C libraries (except on Windows where we build a
bundled copy of the C lib source code). The compression speed is as you
would expect since it's the C lib that is doing all the work.

The zlib package is used in cabal-install to work with .tar.gz files. So
it has actually been tested on Windows. It works with all versions of
ghc since 6.4 (though it requires Cabal-1.2).

The darcs repos for the development versions live on code.haskell.org.

I'm very happy to get feedback on the API, the documentation or of
course any bug reports.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases

2008-11-02 Thread Bulat Ziganshin
Hello Duncan,

Sunday, November 2, 2008, 6:46:00 PM, you wrote:

> People have also asked for a continuation style api to give more control
> over dynamic behaviour like flushing the compression state (eg in a http
> server). Unfortunately this does not look easy.

can you gove more details on these? may be i can help


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases

2008-11-02 Thread Duncan Coutts
On Sun, 2008-11-02 at 19:07 +0300, Bulat Ziganshin wrote:
> Hello Duncan,
> 
> Sunday, November 2, 2008, 6:46:00 PM, you wrote:
> 
> > People have also asked for a continuation style api to give more control
> > over dynamic behaviour like flushing the compression state (eg in a http
> > server). Unfortunately this does not look easy.
> 
> can you gove more details on these? may be i can help

For details talk to Johan Tibell <[EMAIL PROTECTED]>

Suppose you're trying to work with a strict block IO strategy, like one
of these iterator style designs. What kind of api would one want to work
with that?

The constraint is that for a pure api, the zlib compression state must
be used in a single threaded, non-persistent style.

Additionally it would be nice to expose the zlib flush feature. This is
tricky in a straightforward design because it involves a branching
structure of possible operations, and we cannot split the zlib
compression state (at least not cheaply).

If we could do it persistently we could have something like:

data StreamState = OutputAvailable
 ByteString -- the output buffer
 StreamState-- next state
 | InputRequired
  (ByteString -> StreamState) -- supply input
  -- or
  (Flush  -> StreamState) -- flush
 | StreamEnd CheckSum
data Flush = FlushEnd
   | FlushSync
   | FlushFull

initialState :: StreamState

But obviously we cannot do this because we have to guarantee the single
threaded use of the stream state.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] ANNOUNCE: zlib and bzlib 0.5 releases

2008-11-02 Thread Henning Thielemann


On Sun, 2 Nov 2008, Duncan Coutts wrote:


The current error handling for decompression is not ideal. It just
throws exceptions for failures like bad format or unexpected end of
stream. This is a tricky area because error streaming behaviour does not
mix easily with error handling.


Maybe
   
http://hackage.haskell.org/packages/archive/explicit-exception/0.0.1/doc/html/Control-Monad-Exception-Asynchronous.html
  can be of help?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe