Re: [Haskell-cafe] Fun with the ST monad

2011-02-26 Thread Stephen Tetley
Hi wren

Thanks for that explanation - it's by far the clearest description of
iteratees / enumerators I've seen.

Best wishes

Stephen

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-26 Thread Kevin Quick

In part to help solidify my own understanding and usage, I wrote up
the following which shows a comparison of processing an input file.
Andrew Coppin originally posed the issue concerning strictness imposed
by using the ST monad for processing an input file.

This literate example shows a comparison of processing a file using:
   1. the ST monad
   2. the ST monad with Luke Palmer's suggested laziness
   3. the State monad
   4. a direct Iteratee (from John Millikin's Enumerator package)
   5. the same Iteratee in Monad form
   6. another slight variation of the Iteratee in Monad form


First, lets get the basics taken care of:


import System.IO
import System.Environment
import Data.Word
import Data.Bits
import qualified Data.ByteString as B
import Control.Applicative ( (<$>) )
import Control.Monad.Trans.Class (lift)
import Control.Monad.IO.Class
import Control.Monad.ST.Lazy
import Data.STRef.Lazy
import Control.Monad.Trans.State.Lazy
import qualified Data.Enumerator as E
import Data.Enumerator ( ($$) )
import qualified Data.Enumerator.Binary as EB
import qualified Data.Enumerator.List as EL


This example is intended to show the effects of lazy or strict
processing of a file, so an input file is needed.


inp = "input.example"


This input file can contain whatever you'd like, but for my testing I

simply created a 5MB file of zeros via:

   $ dd if=/dev/zero of=input.example count=1
   $ ls -sh input.example
   4.9M input.example

The output file will use the following base name, with the number of
the processing mode appended.


oup = "output.example"


The stats output of ghc will be used to compare the different
processing modes, so only one process will be performed each time the
application is run.  The processing mode desired will be input as a
command-line parameter, defaulting to the first mode.


main = do tna <- getArgs
  let tn = read $ head $ tna ++ ["1"]
  case tn of
6 -> testE 6 transform6
5 -> testE 5 transform5
4 -> testE 4 transform4
3 -> testT 3 transform3
2 -> testT 2 transform2
_ -> testT 1 transform1


To build and run this example (assuming this literate source is saved
as fproc.lhs):

  $ ghc -o fproc --make fproc.lhs && for N in $(seq 1 5) ; do time ./fproc $N 
+RTS -t -RTS ; done


That's all the basic setup out of the way.

The actual processing of the file is irrelevant other than needing to
remember previous input to process the current input.  In my example
each byte is usually combined with the previous byte to determine the
output byte.  In the ST and State monad forms, the previous byte value
is stored in the state portion of the monad.

The ST form is my interpretation of Andrew's original intent.


transform1 xs = runST (newSTRef 0 >>= work xs)



where work [] _ = return []
  work (e:es) s = do n <- readSTRef s
 writeSTRef s $ shiftR e 4
 let r = if e < 32 then e else n+e
 (r :) <$> work es s


To run this with standardized file processing, ByteString -> Word8
conversion, and output, main uses the testT wrapper.  Hopefully all
the pack and unpack operations are fusing and I haven't skewed the
results by introducing strictness at this level.


testT n t = let oun = oup ++ show n
op = B.pack . t . B.unpack
in print n >> op <$> B.readFile inp >>= B.writeFile oun


My output from this is:

./fproc 1 +RTS -t
1
<>

real0m14.998s
user0m13.650s
sys 0m1.333s

This is a processing rate of about 333KB/s, and memory consumption is
quite high, despite lazy processing.  Note that this is GHC 6.12.3, so
it doesn't have the IO performance updates present in 7.x.

Just to verify that there was laziness, I changed the imports from
Control.Monad.ST.Lazy and Data.STRef.Lazy to the .Strict versions and
got this:

./fproc 1 +RTS -t
1
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
<>

real0m10.351s

user0m9.865s
sys 0m0.455s


Luke Palmer recommended some laziness techniques.  Notably I think he
added strictness to the STRef update value computation and used fmap
(x:) to yield a value prior to the recursion.  I don't know if the
latter is also achieved by Applicative's <$> that I used above, but
here is the updated version:


transform2 xs = runST (newSTRef 0 >>= work xs)
where work [] _ = return []
  work (e:es) s = do n <- readSTRef s
 writeSTRef s $! shiftR e 4
 let r = if e < 32 then e else n+e
 fmap (r :) $ work es s



This yields nearly identical results (actually slightly worse, but
that may be within the measuring variance):

./fproc 2 +RTS -t
2
<>

real0m15.378s
user0m13.985s
sys 0m1.346s


And just to verify that the performance is not unique to the ST monad,
here's the same thing with the State m

Re: [Haskell-cafe] Fun with the ST monad

2011-02-25 Thread wren ng thornton

On 2/25/11 2:24 PM, Andrew Coppin wrote:

I've heard much about this "iteratee" things, but I've never looked into
what the hell it actually is.

Today I had a look at TMR #16, which is an explanation which I can just
about follow. It seems that it's actually a kind of fold - not unlike
the "streams" of the stream-fusion library (which is something like what
I thought I might end up needing). It seems to handle *input* very
nicely, but I don't see much in the way of handling *output* well. (Then
again, iteratee is just too complex to really comprehend properly.)


In order to output a "stream" you want to use an "enumeratee":

enumerator -- a "source"
* Consumes: a standard value, e.g. a FilePath or Fd
* Produces: a stream value

enumeratee -- a "pipe"
* Consumes: a stream value
* Produces: a stream value

iteratee -- a "sink"
* Consumes: a stream value
* Produces: a standard value, e.g. the sum of the stream

So when using iteratee-based methods, you'll start off with an 
enumerator, then have a chain of zero or more enumeratees, and then 
finally have the iteratee. The inputs to the enumerator and the outputs 
from the iteratee are just normal values.


If you're familiar with folds, then maybe you're familiar with list 
fusion? There are two basic kinds of list fusion: build/foldr, and 
unfoldr/destroy. The difference between them is just like the difference 
between iteratee-style streams and the standard iterator-style streams. 
Every time we walk over a stream/list in order to compute something, 
there are three steps: the production step, the consumption step, and 
the recursion--- the choice is how we put those three steps together. In 
build/foldr fusion we group the recursion with consumption (foldr); in 
unfoldr/destroy fusion we group the recursion step with production 
(unfoldr).


In the standard iterator-style we have an "iterator" which produces 
values on demand, and then a for-loop or similar which consumes the 
values and does the recursion/iteration. However, this is problematic 
because the iterator never knows if the for-loop will call it again, and 
so it doesn't know when to release resources like file handles.


In the iteratee-style, the enumerator is in charge of both production 
and recursion, and so it can keep forcing the iteratee to consume values 
until the iteratee tells the enumerator it's done. This way the 
enumerator knows when it's finished, and so it can release resources in 
a timely fashion.


Anything other than the above is implementation details which will vary 
depending on the implementation. Make sense?


--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-25 Thread Stephen Tetley
On 25 February 2011 20:38,   wrote:

> The short version is that I think there is a more enlightening view of
> iteratees than as a kind of a fold.  For me, it makes a lot more sense to
> think of them as operations in a particular abstract monad which has one
> associated operation, a blocking read.  Under that view, it is also very
> easy to extend them in arbitrary directions, such as adding support for
> incremental output.

There was a thread on Haskell-cafe a while ago noting some similarity
between the iteratees and the resumption monad.

http://www.haskell.org/pipermail/haskell-cafe/2010-August/082533.html

Note the archives indexing is a little disjointed.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-25 Thread mokus
On Fri, February 25, 2011 11:24 am, Andrew Coppin wrote:
> On 25/02/2011 02:16 AM, wren ng thornton wrote:
>
> > Or
> > converting the whole thing to an iteratee-style computation which is
> > more explicit about the type of stream processing involved and thus what
> > kinds of laziness are possible.
>
> I've heard much about this "iteratee" things, but I've never looked into
> what the hell it actually is.
>
> Today I had a look at TMR #16, which is an explanation which I can just
> about follow. It seems that it's actually a kind of fold - not unlike
> the "streams" of the stream-fusion library (which is something like what
> I thought I might end up needing). It seems to handle *input* very
> nicely, but I don't see much in the way of handling *output* well. (Then
> again, iteratee is just too complex to really comprehend properly.)

I also have had trouble digesting a lot of the literature on iteratees.  A
while back, I wrote up sort of a critique of/alternative to the current
presentations (largely for the self-enlightenment that comes from
wrestling with the concepts myself) and came up with a rather different
perspective on the subject.  I haven't previously shared it, because it's
extremely incomplete (especially the part about enumerators, which I was
about halfway through completely rewriting when I ran out of steam) and is
addressed to a very small (quite possibly non-existent) audience, but feel
free to take a look at it.  I've type-set the document in its current
state to a PDF at:

https://github.com/mokus0/junkbox/blob/master/Papers/HighLevelIteratees/HighLevelIteratees.pdf

This very well may do more to cloud the issue than clarify, and if so I'm
sorry - feel free to disregard me ;)

The short version is that I think there is a more enlightening view of
iteratees than as a kind of a fold.  For me, it makes a lot more sense to
think of them as operations in a particular abstract monad which has one
associated operation, a blocking read.  Under that view, it is also very
easy to extend them in arbitrary directions, such as adding support for
incremental output.

In any case, regarding your original question - I think iteratees are not
the right tool, if for no other reason than that the current
implementations are in my opinion far too brain-bending to use, especially
when it comes to enumeratees which is what you probably need.  Lazy ST
should fit the bill, though.  It works just like normal ST, but acts as if
every bind has 'unsafeInterleaveST'.  There's a good chance that just
changing the imports on your existing code (Control.Monad.ST ->
Control.Monad.ST.Lazy, Data.STRef -> Data.STRef.Lazy, etc.) will make it
work.

-- James


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-25 Thread Andrew Coppin

On 25/02/2011 02:16 AM, wren ng thornton wrote:


Given only this specification, the problem is overconstrained, which is
why you get too much strictness. That is, your types are too general to
allow you to do what you want (e.g., they allow the first Word16 to
depend on the last Word8).


Hmm, true I suppose.


What is it that transform is supposed to do?


Data compression. The input list is raw data, the output list is 
compressed data.


Of course, I *could* just sink the whole thing into the IO monad. But 
that seems a pitty...



As for figuring out how to do it, first consider the following:


Ow, my head! >_<


You can't do that in ST, since allowing this would mean that multiple
evaluations of the (ST s a) embedded in the result could return
different answers and communicate with one another[1].


Yeah, that's the essential conclusion I came to.


But you're probably better off using State[2] instead of ST.


I'm using ST because I want mutable arrays. It's more efficient.


Or
converting the whole thing to an iteratee-style computation which is
more explicit about the type of stream processing involved and thus what
kinds of laziness are possible.


I've heard much about this "iteratee" things, but I've never looked into 
what the hell it actually is.


Today I had a look at TMR #16, which is an explanation which I can just 
about follow. It seems that it's actually a kind of fold - not unlike 
the "streams" of the stream-fusion library (which is something like what 
I thought I might end up needing). It seems to handle *input* very 
nicely, but I don't see much in the way of handling *output* well. (Then 
again, iteratee is just too complex to really comprehend properly.)


The other thing that suggests itself to me: Maybe what I want is not so 
much an ST *monad*, but rather an ST *arrow*. (Isn't one of the 
properties of arrows that the _output_ as well as the input is 
parameterised?)


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-25 Thread Kevin Quick

On Thu, 24 Feb 2011 13:45:59 -0700, Andrew Coppin  
wrote:


The input list is being read from disk by lazy I/O. With the original 
implementation, the input file gets read at the same time as the output file is 
written. But runST returns nothing until the *entire* input has been 
compressed. So writing to disk doesn't start until the entire file has been 
slurped up into memory.
Anybody have any hints on how to get around this?



I'd recommend using an enumerator/iterator package to read and process the file 
as a stream of chunks.  The assumption here is that you don't need the entire 
input to provide enough state to begin generating output.

--
-KQ

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-25 Thread Luke Palmer
Lazy ST is capable of returning values lazily.  Not naively -- eg. if
you are writing elements to an STRef and then returning the contents
of the STRef at the end, then of course it will not return gradually
(who's to say that the last thing you do before you return isn't to
write [] to the STRef?)

However, if you do it this way:

import Control.Monad.ST.Lazy
import Data.STRef.Lazy

main = print $ runST work
where
work = do
ref <- newSTRef 0
let loop = do
x <- readSTRef ref
writeSTRef ref $! x+1
fmap (x:) loop
loop

You will find that it is perfectly lazy.  You just have to communicate
that the computation *must* yield an element regardless of what the
remainder is.  "fmap (x:) rest" is the typical way I yield elements
from lazy ST.

Luke

On Thu, Feb 24, 2011 at 7:55 PM, Sterling Clover  wrote:
> On Feb 24, 2011, at 3:45 PM, Andrew Coppin wrote:
>
> OK, so I had a function that looks like
>
>  transform :: [Word8] -> [Word16]
>
> It works nicely, but I'd like to use mutable state inside. No problem! Use
> the ST monad. Something like
>
>  transform :: [Word8] -> [Word16]
>  transform xs = runST (work xs)
>    where
>  work :: [Word8] -> ST s [Word16]
>
> Ah, yes, well there is one *small* problem... If you do that, the function
> becomes too strict.
>
> unsafeInterleaveST?
> http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Monad-ST.html#v:unsafeInterleaveST
> --Sterl
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-24 Thread Sterling Clover
On Feb 24, 2011, at 3:45 PM, Andrew Coppin wrote:

> OK, so I had a function that looks like
> 
>  transform :: [Word8] -> [Word16]
> 
> It works nicely, but I'd like to use mutable state inside. No problem! Use 
> the ST monad. Something like
> 
>  transform :: [Word8] -> [Word16]
>  transform xs = runST (work xs)
>where
>  work :: [Word8] -> ST s [Word16]
> 
> Ah, yes, well there is one *small* problem... If you do that, the function 
> becomes too strict.

unsafeInterleaveST?

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Monad-ST.html#v:unsafeInterleaveST

--Sterl___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-24 Thread wren ng thornton

On 2/24/11 3:45 PM, Andrew Coppin wrote:

OK, so I had a function that looks like

transform :: [Word8] -> [Word16]

It works nicely, but I'd like to use mutable state inside. No problem!
Use the ST monad. Something like

transform :: [Word8] -> [Word16]
transform xs = runST (work xs)
where
work :: [Word8] -> ST s [Word16]

Ah, yes, well there is one *small* problem... If you do that, the
function becomes too strict.


Given only this specification, the problem is overconstrained, which is 
why you get too much strictness. That is, your types are too general to 
allow you to do what you want (e.g., they allow the first Word16 to 
depend on the last Word8). What is it that transform is supposed to do?


As for figuring out how to do it, first consider the following:

-- | @fix (PreList a) == [a]@ modulo extra bottoms.
type PreList a b = Maybe (a,b)

fmap_PreList :: (b -> c) -> PreList a b -> PreList a c
fmap_PreList f Nothing = Nothing
fmap_PreList f (Just(a,b)) = Just (a, f b)

enlist :: PreList a [a] -> [a]
enlist Nothing   = []
enlist (Just (x,xs)) = x:xs

prelist :: [a] -> PreList a [a]
prelist [] = Nothing
prelist (x:xs) = Just (x,xs)

-- | Monadic version of @Data.List.unfoldr@.
unfoldM :: (Monad m) => (b -> m (PreList a b)) -> (b -> m [a])
unfoldM coalgM b = do
m <- coalgM b
case m of
Nothing -> return []
Just (a,b') -> (a:) `liftM` unfoldM coalgM b'

Assuming that we can generate the elements of [Word16] incrementally, 
then this function almost gives us what we need. The problem is that 
even though the (a:) part is pure by the time we reach it, we can't see 
that fact because of the liftM pushing it down into the monad again. To 
put this a different way, consider the following distributive law:


distList :: (Monad m) => m (PreList a (m [a])) -> m [a]
distList mx_mxs = do
maybe_x_mxs <- mx_mxs
case maybe_x_mxs of
Nothing  -> return []
Just (x,mxs) -> (x:) `liftM` mxs

{- N.B.,
unfoldM coalgM == distList . mfmap (unfoldM coalgM) . coalgM
where
mfmap :: (b -> c) -> m (PreList a b) -> m (PreList a c)
mfmap = liftM . fmap_PreList
-}

In order to factor out the (a:) constructor we need to find some way of 
*not* using distList in unfoldM. That way, the monadic effects 
associated with the head of the list can be separated from the effects 
associated with the tail of the list. Unfortunately, the obvious attempt 
doesn't typecheck.


unfoldM'
:: (Monad m)
=> (b -> m (PreList a b))
-> b -> fix (\rec -> m (PreList a rec))
unfoldM' coalgM = mfmap (unfoldM' coalgM) . coalgM

One problem is the fact that we can't write infinite types, though we 
can get around that easily by using a newtype. The other problem is that 
we need a function for running ST in a way that allows nested ST to be 
run at some later time. Something like,


semirunST :: (Functor f)
  => (forall s. ST s (f (ST s a))) -> f (ST s a)

You can't do that in ST, since allowing this would mean that multiple 
evaluations of the (ST s a) embedded in the result could return 
different answers and communicate with one another[1]. However, if you 
use another monad for encapsulating memory regions (e.g., ST RealWorld, 
STM, IO) then you can probably get away with it.


But you're probably better off using State[2] instead of ST. Or 
converting the whole thing to an iteratee-style computation which is 
more explicit about the type of stream processing involved and thus what 
kinds of laziness are possible.



[1] Though it would be safe to combine it with the newtype:

newtype Compose f g x = Compose (f (g x))
newtype Fix f = Fix (f (Fix f))
interleaveST :: (Functor f) => Fix (Compose (ST s) f) -> Fix f

But given the API for ST, you can't define interleaveST in a way that 
actually interleaves evaluation instead of using a distributive law for 
pulling the (ST s) up over f and then running everything at once.


[2] State is easy:

runfoldState :: (b -> State s (PreList a b)) -> b -> s -> [a]
runfoldState coalgM = evalState . rec
where
rec b = do
m <- coalgM b
case m of
Nothing -> return []
Just (a,b') -> do
s <- get
return (a : evalState (rec b') s)

--
Live well,
~wren

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-24 Thread Alexander Solla
On Thu, Feb 24, 2011 at 2:42 PM, Andrew Coppin
wrote:

>Anybody have any hints on how to get around this?
>>
>> Use a lazy state monad?
>>
>
> That's not going to work. It still needs to read the input to determine
> which monadic action comes next, and hence what the final result will be. So
> whether it forces the result or not, it still has to scan the entire input
> before it can generate any output.
>
>
>From the sound of it, you want some kind of lazy IO, driven/generated by a
state monad.  Check out the "safe-lazy-io".  I've never used it, but the
announcement is pretty convincing.
http://www.haskell.org/pipermail/haskell/2009-March/021133.html
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-24 Thread Andrew Coppin

Anybody have any hints on how to get around this?

Use a lazy state monad?


That's not going to work. It still needs to read the input to determine 
which monadic action comes next, and hence what the final result will 
be. So whether it forces the result or not, it still has to scan the 
entire input before it can generate any output.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Fun with the ST monad

2011-02-24 Thread Alexander Solla
On Thu, Feb 24, 2011 at 12:45 PM, Andrew Coppin  wrote:

>
>
> Ah, yes, well there is one *small* problem... If you do that, the function
> becomes too strict.
>
> The input list is being read from disk by lazy I/O. With the original
> implementation, the input file gets read at the same time as the output file
> is written. But runST returns nothing until the *entire* input has been
> compressed. So writing to disk doesn't start until the entire file has been
> slurped up into memory.
>
> Anybody have any hints on how to get around this?
>

Use a lazy state monad?
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe