Beginners Digest, Vol 31, Issue 37

beginners-request Sun, 30 Jan 2011 03:00:28 -0800

Send Beginners mailing list submissions to
        beginners@haskell.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.haskell.org/mailman/listinfo/beginners
or, via email, send a message with subject or body 'help' to
        beginners-requ...@haskell.org


You can reach the person managing the list at
        beginners-ow...@haskell.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Beginners digest..."


Today's Topics:

   1.  [Haskell-Beginners] Laziness & referential       transparency
      (Tim Baumgartner)
   2. Re:  [Haskell-Beginners] Laziness & referential   transparency
      (Daniel Fischer)
   3. Re:  [Haskell-Beginners] Laziness & referential   transparency
      (Tim Baumgartner)
   4. Re:  [Haskell-Beginners] Laziness & referential   transparency
      (Daniel Fischer)


----------------------------------------------------------------------

Message: 1
Date: Sat, 29 Jan 2011 15:16:41 +0100
From: Tim Baumgartner <baumgartner....@googlemail.com>
Subject: [Haskell-beginners] [Haskell-Beginners] Laziness &
        referential     transparency
To: beginners@haskell.org
Message-ID:
        <aanlktinhpovqtl7yzcw-wvbnf6xlfvuhh3hgde7jn...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi everybody,

after reading a post by Don Stewart on
http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast, I was not really
satisfied with the necessity to rewrite

mean xs = sum xs / fromIntegral (length xs)

to an explicitly recursive function. I investigated a bit further and got
really confused after the following GHCi session:

Prelude> length [1..10^8] + length [1..10^8]
200000000
Prelude> let xs = [1..10^8]
Prelude> length xs + length xs
<interactive>: out of memory

Can someone explain this and give a simple, reasonable rule by which I can
predict such things in the future? Isn't it a pitfall?

Tim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20110129/2535a6ff/attachment-0001.htm>

------------------------------

Message: 2
Date: Sat, 29 Jan 2011 16:18:31 +0100
From: Daniel Fischer <daniel.is.fisc...@googlemail.com>
Subject: Re: [Haskell-beginners] [Haskell-Beginners] Laziness &
        referential     transparency
To: beginners@haskell.org
Message-ID: <201101291618.32124.daniel.is.fisc...@googlemail.com>
Content-Type: text/plain;  charset="utf-8"

On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote:
> Hi everybody,
>
> after reading a post by Don Stewart on
> http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast, I was not really
> satisfied with the necessity to rewrite
>
> mean xs = sum xs / fromIntegral (length xs)
>
> to an explicitly recursive function.

We'd all love a compiler clever enough to transform simple high-level 
specifications into efficient code. But that's very hard and while we're 
waiting, we have to go lower-level sometimes.

> I investigated a bit further and
> got really confused after the following GHCi session:
>
> Prelude> length [1..10^8] + length [1..10^8]
> 200000000
> Prelude> let xs = [1..10^8]
> Prelude> length xs + length xs
> <interactive>: out of memory
>
> Can someone explain this

Yes.

Prelude> length [1 .. 10^8] + length [1 .. 10^8]

creates two lists, each is generated lazily and garbage collected as it is 
consumed by length, so the entire calculation runs in constant space, just 
not particularly fast.

Since you haven't turned off the monomorphism restriction in ghci 
(otherwise the computation would run in constant space),

Prelude> let xs = [1 .. 10^8]

defines a list xs :: [Integer] (the definition yields the constraints Num a 
and Enum a, per the monomorphism restriction, the type variable a defaults 
to Integer), which will be stored for later reuse.

Prelude> length xs + length xs

then starts calculating the first length. That means xs will be completely 
evaluated in the course of that (if there is sufficient memory). Since the 
list is bound to a name in scope, it doesn't get garbage collected, so with 
enough memory, you'll have a 100 million long list of Integers in your 
memory (I always forget how many words per cons cell are required, but 
that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB, 
depending on whether you're on a 32-bit or a 64-bit system, iirc).
In your case, that exhausts your RAM, otherwise calculating the second 
length would be faster since the list need not be created a second time.

With the monomorphism restriction turned off,

Prelude> let xs = [1 .. 10^8]
(0.04 secs, 9183720 bytes)
Prelude> length xs + length xs
200000000
(8.02 secs, 8033302172 bytes)
Prelude> :t xs
xs :: (Enum a, Num a) => [a]

xs remains a polymorphic value, so doesn't get shared across computations 
(since each computation could use it at a different type).


But.
Trying such stuff at the ghci prompt doesn't always tell you much about how 
compiled code behaves.
For example:

module Length where

val :: Int
val = length [1 .. 10^8] + length [1 .. 10^8]

compiled with -O, gives the core (other stuff snipped)

Length.val :: GHC.Types.Int
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 6 2}]
Length.val =
  case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0
  of ww_ayU { __DEFAULT ->
  GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU)
  }

meaning that the value "length [1 .. 10^8]" is reused, thus giving only one 
list traversal (and the list can be garbage-collected as it is consumed).

> and give a simple, reasonable rule by which I
> can predict such things in the future?

No. Just a few rules of thumb.
If an expression gets a name (by a top level definition or a let/where 
binding), it is reused (unless polymorphism prohibits that, that's one of 
the reasons for the MR, without the MR, stuff expected to be shared will be 
recomputed).
GHC does little common subexpression elimination, so if an expression 
contains two identical subexpressions, they may not be shared. Sharing is 
more likely if the subexpressions are simple and 'close together' (for very 
fuzzy values of simple and close).
Unfortunately, one thing that GHC spots well is enumFromTo 
(sugared: [a .. b]), meaning that long lists are often shared if they 
shouldn't be for memory reasons.

> Isn't it a pitfall?

Yes, it is.




------------------------------

Message: 3
Date: Sat, 29 Jan 2011 16:36:09 +0100
From: Tim Baumgartner <baumgartner....@googlemail.com>
Subject: Re: [Haskell-beginners] [Haskell-Beginners] Laziness &
        referential     transparency
To: beginners@haskell.org, Daniel Fischer
        <daniel.is.fisc...@googlemail.com>
Message-ID:
        <AANLkTi=_zxjzrzxu_duewzywgk5crkfpnbt5kmwoj...@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Daniel,

thanks for your explanations! For some time, I thaught laziness might be
some kind of holy grail. But now I realize that it takes both a lot of time
to understand properly and that it has more limitations than I expected.
Another example for this: I wrote a function

recursiveContents :: FilePath -> IO [FilePath]

and later on I thaught it was possible to run some actions on every file
from the list, in a lazy fashion. Then I had to find out that all files had
to be found before I could process the first one. By now, I think I
understand why, but it stops me from being too enthusiastic about laziness
:-(

Regards
Tim


2011/1/29 Daniel Fischer <daniel.is.fisc...@googlemail.com>

> On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote:
> > Hi everybody,
> >
> > after reading a post by Don Stewart on
> > http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast<http://cgi.cse.unsw.edu.au/%7Edons/blog/2008/05/16#fast>,
> I was not really
> > satisfied with the necessity to rewrite
> >
> > mean xs = sum xs / fromIntegral (length xs)
> >
> > to an explicitly recursive function.
>
> We'd all love a compiler clever enough to transform simple high-level
> specifications into efficient code. But that's very hard and while we're
> waiting, we have to go lower-level sometimes.
>
> > I investigated a bit further and
> > got really confused after the following GHCi session:
> >
> > Prelude> length [1..10^8] + length [1..10^8]
> > 200000000
> > Prelude> let xs = [1..10^8]
> > Prelude> length xs + length xs
> > <interactive>: out of memory
> >
> > Can someone explain this
>
> Yes.
>
> Prelude> length [1 .. 10^8] + length [1 .. 10^8]
>
> creates two lists, each is generated lazily and garbage collected as it is
> consumed by length, so the entire calculation runs in constant space, just
> not particularly fast.
>
> Since you haven't turned off the monomorphism restriction in ghci
> (otherwise the computation would run in constant space),
>
> Prelude> let xs = [1 .. 10^8]
>
> defines a list xs :: [Integer] (the definition yields the constraints Num a
> and Enum a, per the monomorphism restriction, the type variable a defaults
> to Integer), which will be stored for later reuse.
>
> Prelude> length xs + length xs
>
> then starts calculating the first length. That means xs will be completely
> evaluated in the course of that (if there is sufficient memory). Since the
> list is bound to a name in scope, it doesn't get garbage collected, so with
> enough memory, you'll have a 100 million long list of Integers in your
> memory (I always forget how many words per cons cell are required, but
> that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB,
> depending on whether you're on a 32-bit or a 64-bit system, iirc).
> In your case, that exhausts your RAM, otherwise calculating the second
> length would be faster since the list need not be created a second time.
>
> With the monomorphism restriction turned off,
>
> Prelude> let xs = [1 .. 10^8]
> (0.04 secs, 9183720 bytes)
> Prelude> length xs + length xs
> 200000000
> (8.02 secs, 8033302172 bytes)
> Prelude> :t xs
> xs :: (Enum a, Num a) => [a]
>
> xs remains a polymorphic value, so doesn't get shared across computations
> (since each computation could use it at a different type).
>
>
> But.
> Trying such stuff at the ghci prompt doesn't always tell you much about how
> compiled code behaves.
> For example:
>
> module Length where
>
> val :: Int
> val = length [1 .. 10^8] + length [1 .. 10^8]
>
> compiled with -O, gives the core (other stuff snipped)
>
> Length.val :: GHC.Types.Int
> [GblId,
>  Str=DmdType,
>  Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
>         ConLike=False, Cheap=False, Expandable=False,
>         Guidance=IF_ARGS [] 6 2}]
> Length.val =
>  case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0
>  of ww_ayU { __DEFAULT ->
>  GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU)
>  }
>
> meaning that the value "length [1 .. 10^8]" is reused, thus giving only one
> list traversal (and the list can be garbage-collected as it is consumed).
>
> > and give a simple, reasonable rule by which I
> > can predict such things in the future?
>
> No. Just a few rules of thumb.
> If an expression gets a name (by a top level definition or a let/where
> binding), it is reused (unless polymorphism prohibits that, that's one of
> the reasons for the MR, without the MR, stuff expected to be shared will be
> recomputed).
> GHC does little common subexpression elimination, so if an expression
> contains two identical subexpressions, they may not be shared. Sharing is
> more likely if the subexpressions are simple and 'close together' (for very
> fuzzy values of simple and close).
> Unfortunately, one thing that GHC spots well is enumFromTo
> (sugared: [a .. b]), meaning that long lists are often shared if they
> shouldn't be for memory reasons.
>
> > Isn't it a pitfall?
>
> Yes, it is.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://www.haskell.org/pipermail/beginners/attachments/20110129/b3a53c6e/attachment-0001.htm>

------------------------------

Message: 4
Date: Sat, 29 Jan 2011 16:53:55 +0100
From: Daniel Fischer <daniel.is.fisc...@googlemail.com>
Subject: Re: [Haskell-beginners] [Haskell-Beginners] Laziness &
        referential     transparency
To: Tim Baumgartner <baumgartner....@googlemail.com>
Cc: beginners@haskell.org
Message-ID: <201101291653.55764.daniel.is.fisc...@googlemail.com>
Content-Type: text/plain;  charset="utf-8"

On Saturday 29 January 2011 16:36:09, Tim Baumgartner wrote:
> Daniel,
>
> thanks for your explanations! For some time, I thaught laziness might be
> some kind of holy grail.

There is no such thing.
Laziness makes it possible to write many algorithms in a concise and 
elegant manner which can't be so expressed in a strict language.
But on the other hand, laziness makes it harder to predict performance and 
memory usage (experience helps a lot with that).

> But now I realize that it takes both a lot of
> time to understand properly and that it has more limitations than I
> expected. Another example for this: I wrote a function
>
> recursiveContents :: FilePath -> IO [FilePath]
>
> and later on I thaught it was possible to run some actions on every file
> from the list, in a lazy fashion. Then I had to find out that all files
> had to be found before I could process the first one.

That depends. Often you can use unsafeInterleaveIO (which is not nearly as 
unsafe as unsafePerformIO) to get the list lazily even in IO. Then, if your 
consumption pattern allows it, you can process the first file before the 
second is found (if, however, processing the files can influence the result 
of recursiveContents, you can see why there is an unsafe in 
unsafeInterleaveIO).

> By now, I think I
> understand why, but it stops me from being too enthusiastic about
> laziness

Well, as with most things, there are advantages and disadvantages.

>
> :-(
>
> Regards
> Tim




------------------------------

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners


End of Beginners Digest, Vol 31, Issue 37
*****************************************

Beginners Digest, Vol 31, Issue 37

Reply via email to