Re: [Haskell-cafe] IO Put confusion

2010-09-16 Thread Ben Millwood
On Wed, Sep 15, 2010 at 12:45 AM, Chad Scherrer  wrote:
> Hello,
>
> I need to be able to use strict bytestrings to efficiently build a
> lazy bytestring, so I'm using putByteString in Data.Binary. But I also
> need random numbers, so I'm using mwc-random. I end up in the "IO Put"
> monad, and it's giving me some issues.
>
> To build a random document, I need a random length, and a collection
> of random words. So I have
> docLength :: IO Int
> word :: IO Put
>
> Oh, also
> putSpace :: Put
>
> My first attempt:
> doc :: IO Put
> doc = docLength >>= go
>  where
>  go 1 = word
>  go n = word >> return putSpace >> go (n-1)

I think you misunderstand, here, what return does, or possibly >>.
This function generates docLength random words, but discards all of
them except for the last one. That's what the >> operator does: run
the IO involved in the left action, but discard the result before
running the right action.

The IO action 'return x' doesn't do any IO, so 'return x >> a' does
nothing, discards x, and then does a, i.e.

return x >> a = a

> Unfortunately, with this approach, you end up with a one-word
> document. I think this makes sense because of the monad laws, but I
> haven't checked it.

Yes, the above equation is required to hold for any monad (it is a
consequence of the law that 'return x >>= f = f x')

>
> Second attempt:
> doc :: IO Put
> doc = docLength >>= go
>  where
>  go 1 = word
>  go n = do
>    w <- word
>    ws <- go (n-1)
>    return (w >> putSpace >> ws)
>
> This one actually works, but it holds onto everything in memory
> instead of outputting as it goes. If docLength tends to be large, this
> leads to big problems.

Here you're using the >> from the Put monad, which appends lazy
ByteStrings rather than sequencing IO actions. The problem is that the
ordering of IO is strict, which means that 'doc' must generate all the
random words before it returns, i.e. it must be completely done before
L.writeFile gets a look-in.

It turns out the problem you're trying to solve isn't actually simple
at all. Some of the best approaches to efficient incremental IO are
quite involved - e.g. Iteratees. But your case could be made a great
deal easier if you used a pure PRNG instead of one requiring IO. If
you could make word a pure function, something like word :: StdGen ->
(StdGen, Put) (which is more or less the same as word :: State StdGen
Put), then you'd be able to use it lazily and safely.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] IO Put confusion

2010-09-14 Thread Chad Scherrer
Hello,

I need to be able to use strict bytestrings to efficiently build a
lazy bytestring, so I'm using putByteString in Data.Binary. But I also
need random numbers, so I'm using mwc-random. I end up in the "IO Put"
monad, and it's giving me some issues.

To build a random document, I need a random length, and a collection
of random words. So I have
docLength :: IO Int
word :: IO Put

Oh, also
putSpace :: Put

My first attempt:
doc :: IO Put
doc = docLength >>= go
  where
  go 1 = word
  go n = word >> return putSpace >> go (n-1)

Unfortunately, with this approach, you end up with a one-word
document. I think this makes sense because of the monad laws, but I
haven't checked it.

Second attempt:
doc :: IO Put
doc = docLength >>= go
  where
  go 1 = word
  go n = do
w <- word
ws <- go (n-1)
return (w >> putSpace >> ws)

This one actually works, but it holds onto everything in memory
instead of outputting as it goes. If docLength tends to be large, this
leads to big problems.

Oh, yes, and my main is currently
main = L.writeFile "out.txt" =<< fmap runPut doc

This needs to be lazier so disk writing can start sooner, and to avoid
eating up tons of memory. Any ideas?

Thanks!
Chad
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe