Re: [Haskell-cafe] puzzling memory leak? (GHC)
On Oct 9, 2005, at 11:32 PM, Ben Lippmeier wrote: This sounds like a text-book space-leak caused by lazy evaluation. In a lazy language, function evaluation is driven by the need to perform IO. Because the first version of your program never prints the parsed structure, the parser doesn't completely parse it. This implies that the system needs to hang onto all the input data (as well as all the partially evaluated calls to the parser) incase it needs it later on. I naively believed (or hoped) that, so long as I the programmer took some care, the evaluator would be smart enough to discard unaccessed computations and values and thus avoid a space leak. Hence the thing I worried about was explicit references to data structures (say, the head of very long lists) that prevent objects from being reclaimed. Now it seems I need to also worry about the hidden aspects of the internal implementation of lazy evaluation. What intuition should be my guide about the proper way of employing lazy evaluation? It's not yet clear to me what you mean by the system needs to hang onto all the input data It seems counterintuitive for the evaluator to hang onto partially evaluated functions and input data when it can be proven (through static analysis) that references to that data can no longer exist in the remainder of the program execution; for example, consider case head $ drop 50 $ parseArts ... of Right x - hPutArts stderr x in which the first 500,000 elements in the result of parseArts can't ever be referenced after 'drop' executes, and yet the space leak still happens (presumably while trying to skip over those 500,000 elements). Have I understood you correctly that this is the unfortunate behavior of lazy evaluation? The default string representation is Haskell is pretty abysmal, and having it use hundreds of megs to represent, say a 10 meg file is not too surprising. Each file I process is 70MB, so inefficient representation could be a problem if input data is buffered. My advice is that if you don't want to fool around with it, just swallow hard, then change fixArts to do a hPutArts to /dev/null. Either that or 1) go read about the DeepSeq library. 2) add strictness annotations to your structure definition. Thanks for the advice; I'll look into both. --Young ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] puzzling memory leak? (GHC)
On 11 October 2005 07:34, Young Hyun wrote: What intuition should be my guide about the proper way of employing lazy evaluation? It's not yet clear to me what you mean by the system needs to hang onto all the input data It seems counterintuitive for the evaluator to hang onto partially evaluated functions and input data when it can be proven (through static analysis) that references to that data can no longer exist in the remainder of the program execution; for example, consider case head $ drop 50 $ parseArts ... of Right x - hPutArts stderr x This example should not have a space leak (try it). Space leaks arise when large unevaluated expressions build up, or there are unevaluated expressions that refer to large amounts of data. The solution is usually to insert some strictness, to force evaluation of the offending expressions, assuming the evaluated representation is smaller than the unevaluated one. I suggest you try out space profiling (with GHC or nhc98), and try various modifications to your code to get a handle on what's happening. It's hard to tell exactly where the space leak is in your original code, because it depends on the actual definition of parseArts. Ben's analysis is definitely plausible. Take a look at the code and ask yourself: does each element of the list refer to previous elements, or to an expression accumulated while traversing the list? If this is the case, and you never actually evaluate the head of the list, then the entire list will be retained. A good way to begin to understand space leaks is to study foldl (see many previous threads on this list). Cheers, Simon ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] puzzling memory leak? (GHC)
Young Hyun wrote: Here's what the program does. It reads a binary file containing a large number of varying-length structured records. [...] Unless I'm badly mistaken, nothing about fixArts suggests that it would leak memory. So, it would appear parseArts is somehow to blame for the memory leak, but this is where it gets weird. If I just slightly change fixArts (see below), then I no longer get a memory leak (the process size stays at just over 3MB): fixArts ((Right x):xs) = do hPutArts stderr x fixArts xs Most likely you think, your parser is building small datat structures, when in reality it builds large and complicated suspended computations, that never run and just take up space. As soon as you demand their result (hPutArts does this), they run and free up the space. GHC itself cannot help you much. Maybe it *could* determine that all these results are really demanded eventually and it won't hurt to compute them earlier, but in practice this is too hard to do. You'll need to give some hints. You should change the fields in your record to strict types (!Integer instead of Integer) and you should force evaluation of these structures by placing a `seq' at the appropriate place. Unfortunately, finding this place is not easy. Probably the best way is to simulate lazy evaluation in your head and see where it goes wrong. parseArts (x:xs) ... = (createArts x) : parseArts xs In this code the result of createArts is not demanded and simply put into a potentially large data structure. It might help to change it to the following: parseArts (x:xs) ... = ((:) $! createArts x) parseArts xs Udo. -- mitten in einer Diskussion über den Evil Mangler in GHC 5.04: shapr the evil mangler uses *perl* ?? ChilliX yes... ChilliX it is Evil after all signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] puzzling memory leak? (GHC)
Hi Young, Just to give a somewhat different perspective, sometimes increasing laziness is helpful for avoiding space problems--not really so much space leaks as space wastage. In this case, you probably can't afford to hold contents in memory at any given time. So you need to be certain both that as it is consumed it is no longer needed (which is reflected in the strictness suggestions given by others), but you also (most likely) don't want to hold the entire list of Arts in memory either, since that'll also take a huge amount of memory. That requires that parseArts generates its output lazily and that fixArts consume it properly. Both of those appear to be the case from the code you outlined, but I figured I'd point out that strictness in the wrong circumstances can be as bad as laziness for your memory usage. Assuming the following is correct... parseArts (x:xs) ... = (createArts x) : parseArts xs then it would be worth checking that main = do contents - getContents fixArts [head $ parseArts contents [] (Map.singleton offset 0)] takes very little memory. If this takes a large amount of memory, then I think you may have a problem with parseArts being insufficiently lazy. -- David Roundy http://www.darcs.net ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] puzzling memory leak? (GHC)
Young, This sounds like a text-book space-leak caused by lazy evaluation. In a lazy language, function evaluation is driven by the need to perform IO. Because the first version of your program never prints the parsed structure, the parser doesn't completely parse it. This implies that the system needs to hang onto all the input data (as well as all the partially evaluated calls to the parser) incase it needs it later on. The default string representation is Haskell is pretty abysmal, and having it use hundreds of megs to represent, say a 10 meg file is not too surprising. By modifying your fixArts function to print the parsed structure you've forced the system to actually finish evaluating the parser, which allows the input data to be garbage collected. My advice is that if you don't want to fool around with it, just swallow hard, then change fixArts to do a hPutArts to /dev/null. Either that or 1) go read about the DeepSeq library. 2) add strictness annotations to your structure definition. Ben. fixArts ((Right x):xs) = do hPutArts stderr x fixArts xs ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe