Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Andrea Rossato wrote: > Now, the state will not be entirely consumed/evaluated by the user, > and so it will not become garbage. Am I right? No. The state cannot become garbage, because there is still a reference to it. As long as runStateT has not returned, any part of the state can still be accessed, so it is not garbage. Completely evaluating the state will not reduce memory consumption in your case, because the list of lists won't be substantially smaller that the thunk to create it. In fact, evaluating this thunk will consume memory. > Where should I force evaluation? You can't. Your state really is that large, at least in the toy example. You'd need a different data structure ((Array Int ByteString) or (Map Int ByteString) come to mind) and then make that strict. Udo. -- The Seventh Commandments for Technicians: Work thou not on energized equipment, for if thou dost, thy fellow workers will surely buy beers for thy widow and console her in other ways. signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Hello Bullat, first of all, thanks for your lengthy and clear explanation. On Sun, Oct 22, 2006 at 04:08:49PM +0400, Bulat Ziganshin wrote: > f a b = let x = a*b > y = a+b > in x `seq` y `seq` (x,y) > > this f definition will not evaluate x and y automatically. BUT its > returned value is not (x,y). its returned value is x `seq` y `seq` (x,y) > and when further computation try to use it in any way, it can't put > hands on the pair before it will evaluate x and y values. are you > understand? Yes, I do understand. But, as far as I know, "seq" will just evaluate x and y enough to see if they are not bottom. So, if x and y are a deep data structure, they won't be evaluated entirely, right? That is to say: > to get real advantage, you need to build your value sequentially in > monad and force evaluation of each step results: > > main = do let x = f 1 > return $! x > let y = f 2 > return $! y > let z = f 3 > return $! z > let a = T x y z > .. ... > > setState ns = > > modify (\s -> s {mystate = ns}) > > here you modify state, but don't ensure that string list is evaluated > on both levels. well, it will be ok if you ensure evaluation at _each_ > call to this function. alternatively, you can force evaluation before > assignment by: > > setState ns = do > return $! map length ns > modify (\s -> s {mystate = ns}) > this is the crucial point. You are forcing evaluation with $! map length, I'm doing it with writeFile. I do not see very much difference. That's an ad hoc solution. Since I need to write the state, instead of (needlessly) looking for the length of it's members, I write it...;-) By the way, the problem is not ns, but s, the old state. The new state has been evaluated by the code below (when we display the number of lines). So you need to change your code with: do s <- getState modify (\s -> s {mystate = ns}) otherwise you are going to have the same space leak as my original code had. In other word, it is possible to have a user who keeps on loading files without looking at any line (in this case getState is never called and so there is no "return $! mystate s"!). This produces the same as: setState ns = do s <- getState -- that does: return $ mystate s liftIO $ writeFile "/dev/null" s modify (\s -> s {mystate = ns}) > if StateT is strict monad transformer, this code don't have space > leaks. you create thunks in two places, in one place you already > evaluate it, and i wrote what to do in second place. Yes, indeed. But just because we forced evaluation in a way that `seq` cannot do. Am I right? Thanks a lot for you patience. You did help me a lot. And this is not the first time. I appreciate. Really. Best regards, Andrea ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Hello! On Sun, Oct 22, 2006 at 12:27:05AM +0400, Bulat Ziganshin wrote: > as Udo said, it should be better to evaluate thunks just when they are > created, by using proper 'seq' calls. While I understand why you and Udo are right, still it is difficult for me to related this discussion to my code. So I wrote a small example that reproduces my problem, with the hope that this will help me understand your point. This is my specific problem, I believe. There is a StateT monad with a list of string as a state. The list is populated with the lines of a file entered by the user. The user may read some lines of this file or request another one: - lFilename will load a file - sNumber will show a line number. The input file is evaluated at the very beginning (in my case that is forced by the xml parser, as far as I understand) and stored as the state. Now, the state will not be entirely consumed/evaluated by the user, and so it will not become garbage. Am I right? Where should I force evaluation? Is it clear my confusion (sorry for this kind of nasty recursion...;-)? Thanks for your kind attention. Best regards, Andrea here's the code: -- module Main where import Control.Monad.State import IO data Mystate = Mystate {mystate :: [String]} type SL = StateT Mystate IO getState :: SL [String] getState = do s <- get return $ mystate s setState ns = modify (\s -> s {mystate = ns}) getFile :: String -> SL () getFile p = do f <- liftIO $ readFile p let lns = lines f -- forces evaluation of lns liftIO $ putStrLn $ "Number of lines: " ++ show (length lns) setState lns promptLoop showLine :: Int -> SL () showLine nr = do s <- getState liftIO $ putStrLn $ s !! nr promptLoop promptStr = "lFilename [load the file Filename] - sNr [show the line Nr of Filename] - q to quit" promptLoop :: SL () promptLoop = do liftIO $ putStrLn promptStr str <- liftIO getLine case str of ('l':ss) -> getFile ss ('s':nr) -> showLine (read nr) ('q':[]) -> liftIO $ return () _ -> promptLoop main = evalStateT promptLoop $ Mystate [] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
On Sat, Oct 21, 2006 at 09:09:13PM +0200, Andrea Rossato wrote: > So, it's parsec, on your side...;-) I sorry, I was a bit confused when I wrote that. I confused you for another person, obviously. Sorry about that. Andrea pgpx3F8PsFdVW.pgp Description: PGP signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Hallo! Thanks a lot for stopping by. On Sat, Oct 21, 2006 at 06:41:32PM +0200, Udo Stenzel wrote: > The correct solution however, is the application of 'seq' at the right > places. To understand where these are, perform a simulation of > Haskell's reduction strategy on paper. I will definitely try that. > > 30 Mega used for reading a feed is a number that I seem to get. > > Depends on what you're doing with the data. If you scan a stream of > Chars exactly once, the space requirement per Char is next to > irrelevant. If you're keeping lots of Strings around, using > PackedStrings will help (and be sure to pack strictly). But I actually > suspect, you are running a backtracking parser over your input, so the > whole input is read into a String and cannot be disposed of as long as > the parser might backtrack. If this is Parsec, you need to remove a > redundant 'try'. If it is the Read class, you need to replace it by > Parsec or ReadP... No. I'm actually using the ReadDocument module of HXT for reading my input and writeDocument for writing. So, it's parsec, on your side...;-) The application I'm writing is basically a HXT application, with an hscurser gui. If you have some hints on how to make memory consumption go down that would be great. Thanks for your kind attention. Andrea pgp6mLPTDHpCe.pgp Description: PGP signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Andrea Rossato wrote: > I did not get an appreciable improvement with performGC, as you can > see from here: > http://gorgias.mine.nu/haskell/a.out.withPerformGC.ps > > But I found a solution: just write the opml state component to a file! Obviously the values in question were not garbage, rather these were unevaluated thunks. Writing the data causes it to be evaluated, the thunks become garbage and get collected. The correct solution however, is the application of 'seq' at the right places. To understand where these are, perform a simulation of Haskell's reduction strategy on paper. > > second, each Char in ghc occupies 12 bytes (!) > > > > multiplying this at 2.5 or even 3 factor which i described in previous > > letter means, say, 30 mb used > > 30 Mega used for reading a feed is a number that I seem to get. Depends on what you're doing with the data. If you scan a stream of Chars exactly once, the space requirement per Char is next to irrelevant. If you're keeping lots of Strings around, using PackedStrings will help (and be sure to pack strictly). But I actually suspect, you are running a backtracking parser over your input, so the whole input is read into a String and cannot be disposed of as long as the parser might backtrack. If this is Parsec, you need to remove a redundant 'try'. If it is the Read class, you need to replace it by Parsec or ReadP... Udo. -- Ich glaub vorher defragmentier ich meine Festplatte, schmeiß alle CDs weg und installier Löwenzahn, Teletubbies, Pokemon usw. auf meinem Rechner. Dann lauf ich amok. Das wird den Psychologen EINIGES zu denken geben. signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Hallo Bulat! On Fri, Oct 20, 2006 at 10:21:51PM +0400, Bulat Ziganshin wrote: > first, GC don't occurs automatically when you close file. you can help > GHC by using performGC from System.Mem. i does it in my own prog I did not get an appreciable improvement with performGC, as you can see from here: http://gorgias.mine.nu/haskell/a.out.withPerformGC.ps But I found a solution: just write the opml state component to a file! At first I decided to write it and reload it: http://gorgias.mine.nu/haskell/a.out.withFileReload.ps But I get a better result by just writing it (to /dev/null or to a real file): http://gorgias.mine.nu/haskell/a.out.withFileWrite.ps This way everything is garbage collected and the graph is what I would expect. Now, this is a good solution since the opml file is generally small (I have a file with 100 subscribed feeds and it is just 46Kbyte) and it stores information on the folder layout, so it must be saved anyway. > second, each Char in ghc occupies 12 bytes (!), so each of your files > occupies about 5 mb of memory. if you will count the previous problem, > the 2 or 3 files can be hels in memory at the same time (just because > they was not yet GCd) so memory usage may become, say, 10 mb > > multiplying this at 2.5 or even 3 factor which i described in previous > letter means, say, 30 mb used 30 Mega used for reading a feed is a number that I seem to get. I also prevent the user from opening more then one feed at once. When a new feed is opened, the old feed gets removed, the opml file is written, the old stuff is garbage collected and the new stuff is added. Now, the memory fingerprint of the program is given by the most memory consuming feed. It is running at about 60-65 Mega. Not that bad, compared to the previous situation. Thank you very much for your suggestions. Regards Andrea ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
On Thu, Oct 19, 2006 at 06:23:35PM +0400, Bulat Ziganshin wrote: > Hello Andrea, > > Wednesday, October 18, 2006, 9:34:28 PM, you wrote: > > > solution? Or just some hints on the kind of problem I'm facing: is it > > related to strictness/laziness, or it's just that I did not understand > > a single bit of how garbage collection works in Haskell? > > i think, the second. unfortunately, i don't know good introduction to > the actual GC implementation although i can give you a pair of > not-so-friendly references: [...[ > > shortly speaking, memory allocated by GHC never shrinks. Hello Bulat, well, you gave me a wonderfully clear introduction to Haskell GC, and now I have a better understanding of the output of the various profiling I'm doing. Thank you very much! Still, I cannot understand my specific problem, that is to say, why the function that reads a file retains so much memory. I did some test and the results are puzzling: - I tried reading the feed and directly converting it into the opml chunk to be inserted into the opml component of my StateT monad. The problem becomes far worse. Here the output of a heap profile: http://gorgias.mine.nu/haskell/a.out.feed2opml.ps as you can see, after opening one feed (397868 bytes), closing it, opening another one (410052 bytes), closing it and reopening the first one brings memory consumption to 152 Mega. Using the intermediate datatype (that is to say, reading the feed, transforming it into my datatype and then to the opml tree), reduces the problem: http://gorgias.mine.nu/haskell/a.out.feed2feedNotStrict.ps only 92 Mega of memory consumption for the very same operations. Making the intermediate datatype strict gives almost the same results: http://gorgias.mine.nu/haskell/a.out.feed2feedStrict.ps 98 Mega. Now, I come to believe the file reading is indeed strict, and that my problem could be related to StateT laziness. Does this makes sense? I'm now going to try to implement my opml state as a IORef and use a ReaderT monad to see if something new happens. > ps: if your program uses a lot if string, FPS will be a very great. it > don;t change the GC behavior, just makes everything 10 times smaller > :) yes, but I'm using HXT and this is using normal strings to store xml text nodes. So I could have some improvements with IO but not that much in memory consumption, unless I totally change my implementation. Anyway, even if I could reduce from 152 to 15 mega the memory consumption for reading 2 feeds, I'd be running out of memory, on my laptop, in one day instead that 5 minutes. Anyway I should face the fact that it is not the string implementation in Haskell that is causing the problem. The problem is probably me! Thanks for your kind attention. Regards Andrea ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
Hello Andrea, Wednesday, October 18, 2006, 9:34:28 PM, you wrote: > solution? Or just some hints on the kind of problem I'm facing: is it > related to strictness/laziness, or it's just that I did not understand > a single bit of how garbage collection works in Haskell? i think, the second. unfortunately, i don't know good introduction to the actual GC implementation although i can give you a pair of not-so-friendly references: Generational garbage collection for Haskell http://research.microsoft.com/~simonpj/Papers/gen-gc-for-haskell.ps.gz GHC Commentary about GC http://hackage.haskell.org/trac/ghc/wiki/GarbageCollectorNotes http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage shortly speaking, memory allocated by GHC never shrinks. GHC by default use 2-stage memory allocator. First, memory for new objects are allocated in 256k chunks ("nursery", whose size controlled by RTS -A option). when these 256 kb is exhausted, "minor GC" occurs. minor GC scans nursery and moves to the global heap all objects that are still alive. btw, it's great idea because minor GC runs very fast and most objects are died at this stage and don't participate in slow major GCs when all the memory, currently allocated for the heap, exhausted - major GC occurs. it scans all objects in the heap and copy alive ones to the new memory blocks that are allocated during this process. so, for example, if just before major GC we have 30 mb heap where only 10 mb alive, then during GC we will alloc 10 mb more, copy alive obejcts there and at the end will free the original 30 megs but these 30 megs don't returned to the OS! instead, they becomes free part of the heap that then filled with new objects. so, after this major GC we have 40 megs heap of which 30 megs are free. next major GC will occur when these 30 megs will be filled by objects that survived after minor GCs well, it's default scenario. with compacting GC or with -H, -F, -G the scenario will slightly change. you can look at behavior of memory manager using +RTS -Sfile option. here is example: AllocCollectLiveGCGC TOT TOT Page Flts bytes bytes bytes user elapuserelap ... 262116266240 119095740 0.00 0.00 23.72 47.3300 (Gen: 0) 262116266240 119269488 0.01 0.01 23.73 47.3400 (Gen: 0) 262116266240 119443236 0.00 0.00 23.73 47.3400 (Gen: 0) 262116266240 119616988 0.01 0.01 23.74 47.3500 (Gen: 0) 262116 119365632 81473776 1.54 1.66 25.29 49.0100 (Gen: 1) 262116266240 81647512 0.00 0.00 25.29 49.0100 (Gen: 0) 262116266240 81821260 0.00 0.00 25.29 49.0100 (Gen: 0) ... each line here reports stats of one GC. '1' in last column means major GC, other are minor ones. third column displays current heap size. as you can see, after each minor GC heap size is increased at 100-200 kb. this means that from 256kb allocated only 100-200k was survived to be moved in the main heap. Heap size was 120 mb and when heap was filled, major GC occurs. after it ends, only 81 mb of data survived, the heap was extended to 120+80=200 megs and next minor GCs continues to fill it up. so, each major GC extends heap (in absence of additional RTS options) and nothing between major GCs can do it. because it was last GC in this run, its stats defined stats of entire program run: 989,191,980 bytes allocated in the heap 467,301,272 bytes copied during GC 81,473,776 bytes maximum residency (9 sample(s)) 3584 collections in generation 0 ( 6.33s) 9 collections in generation 1 ( 3.23s) 194 Mb total memory in use here GHC reports that maximum amount of really used memory was 81 megs (GHC can determine real memory usage only at major GCs), while memory allocated by RTS was 200 megs. so signal/noise ratio is 40% here. sorry, but it is rather typical. there are various techniques to fight with it but in general any memory allocation technique involves a lot of vanished memory. at least GHC GC is enough efficient in terms of time: MUT time 21.73s ( 45.57s elapsed) GCtime9.56s ( 9.94s elapsed) EXIT time0.00s ( 0.00s elapsed) Total time 31.31s ( 55.51s elapsed) %GC time 30.6% (17.9% elapsed) ps: if your program uses a lot if string, FPS will be a very great. it don;t change the GC behavior, just makes everything 10 times smaller :) -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] memory, garbage collection and other newbie's issues
On 10/18/06, Andrea Rossato <[EMAIL PROTECTED]> wrote: Hi! I'm a newbie and, as a learning experience, I'm writing a feed reader with hscurses and hxt. For the present time the feed reader just reads a Liferea cache but, as you can imagine, I'm running into the usual newbie problems of memory consumption and garbage collection, probably (I'm not sure) related to strictness/laziness. Even though I spent a couple of hours search the mailing list archives, I did not come up with something I can relate to, so I'll try to explain my problem. The feed reader, that should be compatible with Liferea, takes an opml (1.0) file, that stores information on folders and subscribed feeds. It uses it as the major component of the state of a ST monad, after adding some attributes used by the reader UI. The UI, that uses the widget library of hscurses and is derived from the Contact Manager example, will just display this opml file, and every UI event (collapsing/expanding of folders, displaying feeds, tagging, flagging, and so on) is just an XML transformation of this opml state component. So, when the feed reader boots, only the layout of folders and subscribed feeds is presented to the user. When the user selects a feed do be displayed, the cached file containing up to 100 saved posts, is read and transformed into a data type (called Feed, obviously). After that this data type is transformed into an opml (xml) tree, that is inserted as a child in the appropriate place of the opml state component. Moreover the parent element of the opml state component (which holds the original information of the subscribed feed) is edited for adding general feed information (such as last update, feed's attributes, and so on) retrieved by reading the file. When the user collapses the feed, the added opml chunk is deleted from the state component (but not the added information to the parent of this chunk). Now, I would expect that after the opml chunk is deleted all the memory allocated for reading the cached file would be garbage collected. This is not happening, so, every time you open (or reopen) a feed, the used memory of the feed reader increases, and never decreases. After profiling I've seen that the problem is occurring in the function that reads the cached file: loadFeed :: String -> IO [Feed] readFeed id = do [a] <- runX $ readDocument [(a_validate, v_0)] (cachePath ++ id) return $ runLA toFeed a What this function does is reading the file with: h <- openFile ... hGetContents h and applying some XML filters to get the Feed type populated with the needed information. I tried making the function strict with $!. I tried using fps. It doesn't change this behaviour, obviously. Have you tried adding strictness annotations to your data type? For example, something like this: data Foo a = Foo !a If you do this for the datatypes you want to get deleted, my understanding is that it will help. I think this helps because the garbage collector will have one less excuse for not cleaning up the values since they will be forced to exist completely when they are created instead of only existing partially (in which case, if I understand correctly, the garbage collector doesn't throw away partially constructed values). You said you did profiling, have you done retainer profiling? I haven't used it myself, but I think that it is designed to help you identify where the memory is being leaked. I've been using haskell on and off for over a year now but still consider myself to be mostly a newbie as well. I started using haskell at work recently on a project and I've found that although haskell makes it so that I don't spend much time debugging (thanks referential transparency!), testing (thanks quickcheck!), or writing code (thanks haskell in general!), I do spend a lot of time profiling and optimizing for time/space. I think it's interesting that the development is mostly quick and easy but polishing and making it ready for general use can still be hard because of performance issues. Although, I'd rather spend my time optimizing something that works than debugging pointer problems. HTH, Jason ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe