Send Beginners mailing list submissions to beginners@haskell.org To subscribe or unsubscribe via the World Wide Web, visit http://www.haskell.org/mailman/listinfo/beginners or, via email, send a message with subject or body 'help' to beginners-requ...@haskell.org
You can reach the person managing the list at beginners-ow...@haskell.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Beginners digest..." Today's Topics: 1. [Haskell-Beginners] Laziness & referential transparency (Tim Baumgartner) 2. Re: [Haskell-Beginners] Laziness & referential transparency (Daniel Fischer) 3. Re: [Haskell-Beginners] Laziness & referential transparency (Tim Baumgartner) 4. Re: [Haskell-Beginners] Laziness & referential transparency (Daniel Fischer) ---------------------------------------------------------------------- Message: 1 Date: Sat, 29 Jan 2011 15:16:41 +0100 From: Tim Baumgartner <baumgartner....@googlemail.com> Subject: [Haskell-beginners] [Haskell-Beginners] Laziness & referential transparency To: beginners@haskell.org Message-ID: <aanlktinhpovqtl7yzcw-wvbnf6xlfvuhh3hgde7jn...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi everybody, after reading a post by Don Stewart on http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast, I was not really satisfied with the necessity to rewrite mean xs = sum xs / fromIntegral (length xs) to an explicitly recursive function. I investigated a bit further and got really confused after the following GHCi session: Prelude> length [1..10^8] + length [1..10^8] 200000000 Prelude> let xs = [1..10^8] Prelude> length xs + length xs <interactive>: out of memory Can someone explain this and give a simple, reasonable rule by which I can predict such things in the future? Isn't it a pitfall? Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20110129/2535a6ff/attachment-0001.htm> ------------------------------ Message: 2 Date: Sat, 29 Jan 2011 16:18:31 +0100 From: Daniel Fischer <daniel.is.fisc...@googlemail.com> Subject: Re: [Haskell-beginners] [Haskell-Beginners] Laziness & referential transparency To: beginners@haskell.org Message-ID: <201101291618.32124.daniel.is.fisc...@googlemail.com> Content-Type: text/plain; charset="utf-8" On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote: > Hi everybody, > > after reading a post by Don Stewart on > http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast, I was not really > satisfied with the necessity to rewrite > > mean xs = sum xs / fromIntegral (length xs) > > to an explicitly recursive function. We'd all love a compiler clever enough to transform simple high-level specifications into efficient code. But that's very hard and while we're waiting, we have to go lower-level sometimes. > I investigated a bit further and > got really confused after the following GHCi session: > > Prelude> length [1..10^8] + length [1..10^8] > 200000000 > Prelude> let xs = [1..10^8] > Prelude> length xs + length xs > <interactive>: out of memory > > Can someone explain this Yes. Prelude> length [1 .. 10^8] + length [1 .. 10^8] creates two lists, each is generated lazily and garbage collected as it is consumed by length, so the entire calculation runs in constant space, just not particularly fast. Since you haven't turned off the monomorphism restriction in ghci (otherwise the computation would run in constant space), Prelude> let xs = [1 .. 10^8] defines a list xs :: [Integer] (the definition yields the constraints Num a and Enum a, per the monomorphism restriction, the type variable a defaults to Integer), which will be stored for later reuse. Prelude> length xs + length xs then starts calculating the first length. That means xs will be completely evaluated in the course of that (if there is sufficient memory). Since the list is bound to a name in scope, it doesn't get garbage collected, so with enough memory, you'll have a 100 million long list of Integers in your memory (I always forget how many words per cons cell are required, but that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB, depending on whether you're on a 32-bit or a 64-bit system, iirc). In your case, that exhausts your RAM, otherwise calculating the second length would be faster since the list need not be created a second time. With the monomorphism restriction turned off, Prelude> let xs = [1 .. 10^8] (0.04 secs, 9183720 bytes) Prelude> length xs + length xs 200000000 (8.02 secs, 8033302172 bytes) Prelude> :t xs xs :: (Enum a, Num a) => [a] xs remains a polymorphic value, so doesn't get shared across computations (since each computation could use it at a different type). But. Trying such stuff at the ghci prompt doesn't always tell you much about how compiled code behaves. For example: module Length where val :: Int val = length [1 .. 10^8] + length [1 .. 10^8] compiled with -O, gives the core (other stuff snipped) Length.val :: GHC.Types.Int [GblId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, Cheap=False, Expandable=False, Guidance=IF_ARGS [] 6 2}] Length.val = case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0 of ww_ayU { __DEFAULT -> GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU) } meaning that the value "length [1 .. 10^8]" is reused, thus giving only one list traversal (and the list can be garbage-collected as it is consumed). > and give a simple, reasonable rule by which I > can predict such things in the future? No. Just a few rules of thumb. If an expression gets a name (by a top level definition or a let/where binding), it is reused (unless polymorphism prohibits that, that's one of the reasons for the MR, without the MR, stuff expected to be shared will be recomputed). GHC does little common subexpression elimination, so if an expression contains two identical subexpressions, they may not be shared. Sharing is more likely if the subexpressions are simple and 'close together' (for very fuzzy values of simple and close). Unfortunately, one thing that GHC spots well is enumFromTo (sugared: [a .. b]), meaning that long lists are often shared if they shouldn't be for memory reasons. > Isn't it a pitfall? Yes, it is. ------------------------------ Message: 3 Date: Sat, 29 Jan 2011 16:36:09 +0100 From: Tim Baumgartner <baumgartner....@googlemail.com> Subject: Re: [Haskell-beginners] [Haskell-Beginners] Laziness & referential transparency To: beginners@haskell.org, Daniel Fischer <daniel.is.fisc...@googlemail.com> Message-ID: <AANLkTi=_zxjzrzxu_duewzywgk5crkfpnbt5kmwoj...@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Daniel, thanks for your explanations! For some time, I thaught laziness might be some kind of holy grail. But now I realize that it takes both a lot of time to understand properly and that it has more limitations than I expected. Another example for this: I wrote a function recursiveContents :: FilePath -> IO [FilePath] and later on I thaught it was possible to run some actions on every file from the list, in a lazy fashion. Then I had to find out that all files had to be found before I could process the first one. By now, I think I understand why, but it stops me from being too enthusiastic about laziness :-( Regards Tim 2011/1/29 Daniel Fischer <daniel.is.fisc...@googlemail.com> > On Saturday 29 January 2011 15:16:41, Tim Baumgartner wrote: > > Hi everybody, > > > > after reading a post by Don Stewart on > > http://cgi.cse.unsw.edu.au/~dons/blog/2008/05/16#fast<http://cgi.cse.unsw.edu.au/%7Edons/blog/2008/05/16#fast>, > I was not really > > satisfied with the necessity to rewrite > > > > mean xs = sum xs / fromIntegral (length xs) > > > > to an explicitly recursive function. > > We'd all love a compiler clever enough to transform simple high-level > specifications into efficient code. But that's very hard and while we're > waiting, we have to go lower-level sometimes. > > > I investigated a bit further and > > got really confused after the following GHCi session: > > > > Prelude> length [1..10^8] + length [1..10^8] > > 200000000 > > Prelude> let xs = [1..10^8] > > Prelude> length xs + length xs > > <interactive>: out of memory > > > > Can someone explain this > > Yes. > > Prelude> length [1 .. 10^8] + length [1 .. 10^8] > > creates two lists, each is generated lazily and garbage collected as it is > consumed by length, so the entire calculation runs in constant space, just > not particularly fast. > > Since you haven't turned off the monomorphism restriction in ghci > (otherwise the computation would run in constant space), > > Prelude> let xs = [1 .. 10^8] > > defines a list xs :: [Integer] (the definition yields the constraints Num a > and Enum a, per the monomorphism restriction, the type variable a defaults > to Integer), which will be stored for later reuse. > > Prelude> length xs + length xs > > then starts calculating the first length. That means xs will be completely > evaluated in the course of that (if there is sufficient memory). Since the > list is bound to a name in scope, it doesn't get garbage collected, so with > enough memory, you'll have a 100 million long list of Integers in your > memory (I always forget how many words per cons cell are required, but > that's something like 400 or 500 million words, 1.6 or 2 / 3.2 or 4 GB, > depending on whether you're on a 32-bit or a 64-bit system, iirc). > In your case, that exhausts your RAM, otherwise calculating the second > length would be faster since the list need not be created a second time. > > With the monomorphism restriction turned off, > > Prelude> let xs = [1 .. 10^8] > (0.04 secs, 9183720 bytes) > Prelude> length xs + length xs > 200000000 > (8.02 secs, 8033302172 bytes) > Prelude> :t xs > xs :: (Enum a, Num a) => [a] > > xs remains a polymorphic value, so doesn't get shared across computations > (since each computation could use it at a different type). > > > But. > Trying such stuff at the ghci prompt doesn't always tell you much about how > compiled code behaves. > For example: > > module Length where > > val :: Int > val = length [1 .. 10^8] + length [1 .. 10^8] > > compiled with -O, gives the core (other stuff snipped) > > Length.val :: GHC.Types.Int > [GblId, > Str=DmdType, > Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, > ConLike=False, Cheap=False, Expandable=False, > Guidance=IF_ARGS [] 6 2}] > Length.val = > case GHC.List.$wlen @ GHC.Integer.Type.Integer Length.val1 0 > of ww_ayU { __DEFAULT -> > GHC.Types.I# (GHC.Prim.+# ww_ayU ww_ayU) > } > > meaning that the value "length [1 .. 10^8]" is reused, thus giving only one > list traversal (and the list can be garbage-collected as it is consumed). > > > and give a simple, reasonable rule by which I > > can predict such things in the future? > > No. Just a few rules of thumb. > If an expression gets a name (by a top level definition or a let/where > binding), it is reused (unless polymorphism prohibits that, that's one of > the reasons for the MR, without the MR, stuff expected to be shared will be > recomputed). > GHC does little common subexpression elimination, so if an expression > contains two identical subexpressions, they may not be shared. Sharing is > more likely if the subexpressions are simple and 'close together' (for very > fuzzy values of simple and close). > Unfortunately, one thing that GHC spots well is enumFromTo > (sugared: [a .. b]), meaning that long lists are often shared if they > shouldn't be for memory reasons. > > > Isn't it a pitfall? > > Yes, it is. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.haskell.org/pipermail/beginners/attachments/20110129/b3a53c6e/attachment-0001.htm> ------------------------------ Message: 4 Date: Sat, 29 Jan 2011 16:53:55 +0100 From: Daniel Fischer <daniel.is.fisc...@googlemail.com> Subject: Re: [Haskell-beginners] [Haskell-Beginners] Laziness & referential transparency To: Tim Baumgartner <baumgartner....@googlemail.com> Cc: beginners@haskell.org Message-ID: <201101291653.55764.daniel.is.fisc...@googlemail.com> Content-Type: text/plain; charset="utf-8" On Saturday 29 January 2011 16:36:09, Tim Baumgartner wrote: > Daniel, > > thanks for your explanations! For some time, I thaught laziness might be > some kind of holy grail. There is no such thing. Laziness makes it possible to write many algorithms in a concise and elegant manner which can't be so expressed in a strict language. But on the other hand, laziness makes it harder to predict performance and memory usage (experience helps a lot with that). > But now I realize that it takes both a lot of > time to understand properly and that it has more limitations than I > expected. Another example for this: I wrote a function > > recursiveContents :: FilePath -> IO [FilePath] > > and later on I thaught it was possible to run some actions on every file > from the list, in a lazy fashion. Then I had to find out that all files > had to be found before I could process the first one. That depends. Often you can use unsafeInterleaveIO (which is not nearly as unsafe as unsafePerformIO) to get the list lazily even in IO. Then, if your consumption pattern allows it, you can process the first file before the second is found (if, however, processing the files can influence the result of recursiveContents, you can see why there is an unsafe in unsafeInterleaveIO). > By now, I think I > understand why, but it stops me from being too enthusiastic about > laziness Well, as with most things, there are advantages and disadvantages. > > :-( > > Regards > Tim ------------------------------ _______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners End of Beginners Digest, Vol 31, Issue 37 *****************************************