Re: [Haskell-cafe] Space usage problems

2006-01-10 Thread Chris Kuklewicz
I'll make a guess...

Ian Lynagh wrote:
> Hi all,
> 
> In the middle (readChunks) is the equivalent of gunzip. It repeatedly
> calls foo until there is no more input left.
> 
> At the top is a simple main function that calls them.
> 
> If I do
> 
> dd if=/dev/zero of=data bs=1000 count=3000" # making data around 3MB
> ghc --make Test1 -o Test1 -O -Wall
> ./Test1
> 
> then in top I see Test1 increasing memory usage to around 150MB. I think
> this is because the "let (ys, zs) = foo xs" means zs holds on to xs
> (it's hard to be sure as compiling for profiling is too happy to change
> the behaviour).

I don't have a comment on your guess.

> 
> I tried (Test2) changing foo to be a monad transformer over the calling
> monad, so the caller's remaining input was updated as we went along, but
> (as well as memory usage not obviously being fixed) this is giving me a
> stack overflow.

I will ignore Test2

> Has anyone got any suggestions for making a constant space, constant
> stack version?

Not yet.

> 
> 
> Thanks
> Ian

> 
> 
> 
> module Main (main) where
> 
> import Control.Monad (liftM)
> import Control.Monad.State (State, runState, evalState, get, put)
> 
> main :: IO ()
> main = do xs <- readFile "data"
>   ys <- readFile "data"
>   print (evalState readChunks xs == ys)

The equality should be constant space.

> 
> ---
> 
> type FirstMonad = State String
> 
> readChunks :: FirstMonad String
> readChunks = do xs <- get
> if null xs then return []
>else do let (ys, zs) = foo xs
>put zs
And zs is the final state of "runState bar" which is suspect is []
And ys is the whole input (which is now all in memory)
>rest <- readChunks
>return (ys ++ rest)
> 
> ---
> 
> type SecondMonad = State String
> 
> foo :: String -> (String, String)
> foo = runState bar
> 
> bar :: SecondMonad String
> bar = do inp <- get
>  case inp of
>  [] -> return []
>  x:xs -> do put xs
> liftM (x:) bar
The liftM should be equivalent to
  temp <- bar
  return ( (x:) temp )

It looks like the first call to foo will have bar consuming the entire
input string.

So the flow looks like

main

 readChuncks all-input

  foo all-input

   bar (iterated over whole input length)

   foo returns (all-input, [])

  "rest <- readChunks" (recursive call, sees null xs then "return []")

  "return (ys ++ rest)" which is return (all-input ++ [])


In essence, your bar traverses the whole string until the state is
empty.  This loads your whole file into memory
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Space usage problems

2006-01-10 Thread Ian Lynagh
On Tue, Jan 10, 2006 at 05:28:03PM +, Chris Kuklewicz wrote:
> I'll make a guess...
> 
> Ian Lynagh wrote:
> > Hi all,
> > 
> > foo :: String -> (String, String)
> > foo = runState bar
> > 
> > bar :: SecondMonad String
> > bar = do inp <- get
> >  case inp of
> >  [] -> return []
> >  x:xs -> do put xs
> > liftM (x:) bar
> The liftM should be equivalent to
>   temp <- bar
>   return ( (x:) temp )
> 
> It looks like the first call to foo will have bar consuming the entire
> input string.

I'm not entirely sure what you mean here. The result will be the entire
input string, but State is a lazy monad, so it won't have to consume it
all before it starts returning it.

For example, if you replace the definition of foo with

foo xs = (evalState bar xs, "")

then the program runs in constant space (but this isn't a solution to
the real problem, as bar will only consume a prefix of the string
there).


Thanks
Ian

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Space usage problems

2006-01-10 Thread Chris Kuklewicz
I will continue to guess...

Ian Lynagh wrote:
> On Tue, Jan 10, 2006 at 05:28:03PM +, Chris Kuklewicz wrote:
> 
>>I'll make a guess...
>>
>>Ian Lynagh wrote:
>>
>>>Hi all,
>>>
>>>foo :: String -> (String, String)
>>>foo = runState bar
>>>
>>>bar :: SecondMonad String
>>>bar = do inp <- get
>>> case inp of
>>> [] -> return []
>>> x:xs -> do put xs
>>>liftM (x:) bar
>>
>>The liftM should be equivalent to
>>  temp <- bar
>>  return ( (x:) temp )
>>
>>It looks like the first call to foo will have bar consuming the entire
>>input string.
> 
> 
> I'm not entirely sure what you mean here. The result will be the entire
> input string, but State is a lazy monad, so it won't have to consume it
> all before it starts returning it.
> 
> For example, if you replace the definition of foo with
> 
> foo xs = (evalState bar xs, "")
> 
> then the program runs in constant space (but this isn't a solution to
> the real problem, as bar will only consume a prefix of the string
> there).
> 
Yes, exactly as I would have predicted.

Your "let (yx,zs) = foo xs
  put zs"
takes the second of the tuple retuned from "foo = runState bar" and
put's it.  Then in the recursive readChucks call, it does
  xs <- get
  if (null xs)

So it has to decide if xs (which is zs which is the snd value from foo
which is the state from runState bar which is "" or []) is null or not.

This forces it to get the head of the String state that bar returns, or
[] since there is no head.  But it does not know that it is [] until bar
is fully finished, which loads the whole file into memory.

When you put (evalState bar xs, "") then zs is [] and put [] leads to
get [] and null [] is true so it returns [] to the nested readChunks
call.  This does not force the file to be read.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Space usage problems

2006-01-10 Thread Daniel Fischer
Am Dienstag, 10. Januar 2006 17:44 schrieb Ian Lynagh:
> Hi all,
>
> I am having space issues with some decompression code; I've attached a
> much simplified version as Test1.hs.
>
> At the bottom (foo/bar) is the equivalent of deflate. This should be a
> standalone module which doesn't know about the rest.
>
> In the middle (readChunks) is the equivalent of gunzip. It repeatedly
> calls foo until there is no more input left.
>
> At the top is a simple main function that calls them.
>
> If I do
>
> dd if=/dev/zero of=data bs=1000 count=3000" # making data around 3MB
> ghc --make Test1 -o Test1 -O -Wall
> ./Test1
>
> then in top I see Test1 increasing memory usage to around 150MB. I think
> this is because the "let (ys, zs) = foo xs" means zs holds on to xs
> (it's hard to be sure as compiling for profiling is too happy to change
> the behaviour).

I had 72 Mb space usage for a 3Mb file.

I believe, it's the 'put zs' that's consuming the memory. I changed it to

readChunks = do xs <- get
if null xs then return []
   else do let (ys, zs) = foo xs
   rest = evalState readChunks zs
   return (ys ++ rest)

and got much smaller memory usage (10Mb) -- not sure, how sensible that would 
be for real work and why it reduces memory. If bar can start returning before 
it's finished, then the same holds for the modified readChunks, but the 
original would have to wait for the completion of bar (via foo) until zs can 
be put, so the complete ys would have to be in memory at once.

Just checked, modified version also runs in 10Mb for a 12mb data file, 
so indeed bar starts returning before finishing and it seems the above is 
right.
./test4 +RTS -sstderr
True
1,496,184,404 bytes allocated in the heap
987,852,924 bytes copied during GC
  3,226,492 bytes maximum residency (162 sample(s))

   5707 collections in generation 0 ( 12.66s)
162 collections in generation 1 ( 14.82s)

 10 Mb total memory in use

  INIT  time0.00s  (  0.01s elapsed)
  MUT   time6.88s  ( 15.06s elapsed)
  GCtime   27.48s  ( 55.93s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   34.36s  ( 71.00s elapsed)

  %GC time  80.0%  (78.8% elapsed)

  Alloc rate217,468,663 bytes per MUT second

  Productivity  20.0% of total user, 9.7% of total elapsed


>
> I tried (Test2) changing foo to be a monad transformer over the calling
> monad, so the caller's remaining input was updated as we went along, but
> (as well as memory usage not obviously being fixed) this is giving me a
> stack overflow.
>
>
> Has anyone got any suggestions for making a constant space, constant
> stack version?
>
>
> Thanks
> Ian

Hope that helps,
Daniel

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Space usage problems

2006-01-10 Thread Ian Lynagh
On Tue, Jan 10, 2006 at 04:44:33PM +, Ian Lynagh wrote:
> 
> readChunks :: FirstMonad String
> readChunks = do xs <- get
> if null xs then return []
>else do let (ys, zs) = foo xs
>put zs
>rest <- readChunks
>return (ys ++ rest)

It looks like changing this let to a case fixes this example, but at the
time I'd experimented with that there must have been other issues
clouding the effect, such as the following.

Foo1 (attached) uses large amounts of memory whereas Foo2 (also
attached) runs in a little constant space. The difference is only
changing this:

else do chunk <- case foo xs of
 (ys, zs) ->
 do put zs
return ys
chunks <- readChunks
return (chunk ++ chunks)

to this:

else case foo xs of
 (ys, zs) ->
 do put zs
chunks <- readChunks
return (ys ++ chunks)

but I don't have a good feeling for why this should be the case given
I'd expect chunk to be forced, and thus the case evaluated, at the same
point in Foo1 as the case is evaluated in Foo2.

Is this just a case of GHC's optimiser's behaviour depending on subtle
source changes, or am I missing something?


Thanks
Ian


module Main (main) where

import Control.Monad (liftM)
import Control.Monad.State (State, runState, evalState, get, put)

main :: IO ()
main = do xs <- readFile "data"
  ys <- readFile "data"
  print (evalState readChunks xs == ys)

---

type FirstMonad = State String

readChunks :: FirstMonad String
readChunks = do xs <- get
if null xs then return []
   else do chunk <- case foo xs of
(ys, zs) ->
do put zs
   return ys
   chunks <- readChunks
   return (chunk ++ chunks)

---

type SecondMonad = State String

foo :: String -> (String, String)
foo = runState bar

bar :: SecondMonad String
bar = do inp <- get
 case inp of
 [] -> return []
 x:xs -> do put xs
liftM (x:) bar


module Main (main) where

import Control.Monad (liftM)
import Control.Monad.State (State, runState, evalState, get, put)

main :: IO ()
main = do xs <- readFile "data"
  ys <- readFile "data"
  print (evalState readChunks xs == ys)

---

type FirstMonad = State String

readChunks :: FirstMonad String
readChunks = do xs <- get
if null xs then return []
   else case foo xs of
(ys, zs) ->
do put zs
   chunks <- readChunks
   return (ys ++ chunks)

---

type SecondMonad = State String

foo :: String -> (String, String)
foo = runState bar

bar :: SecondMonad String
bar = do inp <- get
 case inp of
 [] -> return []
 x:xs -> do put xs
liftM (x:) bar

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe