Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-22 Thread Udo Stenzel
Andrea Rossato wrote:
> Now, the state will not be entirely consumed/evaluated by the user,
> and so it will not become garbage. Am I right?

No.  The state cannot become garbage, because there is still a reference
to it.  As long as runStateT has not returned, any part of the state can
still be accessed, so it is not garbage.  Completely evaluating the
state will not reduce memory consumption in your case, because the list
of lists won't be substantially smaller that the thunk to create it.  In
fact, evaluating this thunk will consume memory.

 
> Where should I force evaluation? 

You can't.  Your state really is that large, at least in the toy
example.  You'd need a different data structure ((Array Int ByteString)
or (Map Int ByteString) come to mind) and then make that strict.


Udo.
-- 
The Seventh Commandments for Technicians:
Work thou not on energized equipment, for if thou dost, thy
fellow workers will surely buy beers for thy widow and console
her in other ways.


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-22 Thread Andrea Rossato
Hello Bullat,

first of all, thanks for your lengthy and clear explanation. 

On Sun, Oct 22, 2006 at 04:08:49PM +0400, Bulat Ziganshin wrote:
> f a b = let x = a*b
> y = a+b
> in x `seq` y `seq` (x,y)
> 
> this f definition will not evaluate x and y automatically. BUT its
> returned value is not (x,y). its returned value is x `seq` y `seq` (x,y)
> and when further computation try to use it in any way, it can't put
> hands on the pair before it will evaluate x and y values. are you
> understand?

Yes, I do understand. But, as far as I know, "seq" will just evaluate x
and y enough to see if they are not bottom. So, if x and y are a deep
data structure, they won't be evaluated entirely, right?

That is to say:

> to get real advantage, you need to build your value sequentially in
> monad and force evaluation of each step results:
> 
> main = do let x = f 1
>   return $! x
>   let y = f 2
>   return $! y
>   let z = f 3
>   return $! z
>   let a = T x y z
>   ..


...

> > setState ns =
> > modify (\s -> s {mystate = ns})
> 
> here you modify state, but don't ensure that string list is evaluated
> on both levels. well, it will be ok if you ensure evaluation at _each_
> call to this function. alternatively, you can force evaluation before
> assignment by:
> 
>  setState ns = do
>  return $! map length ns
>  modify (\s -> s {mystate = ns})
> 

this is the crucial point. You are forcing evaluation with $! map
length, I'm doing it with writeFile. I do not see very much
difference. That's an ad hoc solution. Since I need to write the
state, instead of (needlessly) looking for the length of it's members,
I write it...;-)

By the way, the problem is not ns, but s, the old state. The new state
has been evaluated by the code below (when we display the number of
lines). So you need to change your code with:

do s <- getState
   modify (\s -> s {mystate = ns})

otherwise you are going to have the same space leak as my original
code had. In other word, it is possible to have a user who keeps on
loading files without looking at any line (in this case getState is
never called and so there is no "return $! mystate s"!).

This produces the same as:
setState ns =
do s <- getState -- that does: return $ mystate s
   liftIO $ writeFile "/dev/null" s
   modify (\s -> s {mystate = ns})

> if StateT is strict monad transformer, this code don't have space
> leaks. you create thunks in two places, in one place you already
> evaluate it, and i wrote what to do in second place.

Yes, indeed. But just because we forced evaluation in a way that `seq`
cannot do. Am I right?

Thanks a lot for you patience. You did help me a lot. And this is not
the first time. I appreciate. Really.
Best regards,
Andrea
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-22 Thread Andrea Rossato
Hello!

On Sun, Oct 22, 2006 at 12:27:05AM +0400, Bulat Ziganshin wrote:
> as Udo said, it should be better to evaluate thunks just when they are
> created, by using proper 'seq' calls.

While I understand why you and Udo are right, still it is difficult
for me to related this discussion to my code. So I wrote a small
example that reproduces my problem, with the hope that this will help
me understand your point.

This is my specific problem, I believe.

There is a StateT monad with a list of string as a state.
The list is populated with the lines of a file entered by the user.
The user may read some lines of this file or request another one:
- lFilename will load a file
- sNumber will show a line number.

The input file is evaluated at the very beginning (in my case that is
forced by the xml parser, as far as I understand) and stored as the
state.

Now, the state will not be entirely consumed/evaluated by the user,
and so it will not become garbage. Am I right?

Where should I force evaluation? 

Is it clear my confusion (sorry for this kind of nasty recursion...;-)?

Thanks for your kind attention.

Best regards,
Andrea

here's the code:

--
module Main where

import Control.Monad.State
import IO

data Mystate = Mystate {mystate :: [String]}

type SL = StateT Mystate IO

getState :: SL [String]
getState =
do s <- get
   return $ mystate s

setState ns =
modify (\s -> s {mystate = ns})

getFile :: String -> SL ()
getFile p =
do f <- liftIO $ readFile p
   let lns = lines f
   -- forces evaluation of lns
   liftIO $ putStrLn $ "Number of lines: " ++ show (length lns)
   setState lns
   promptLoop

showLine :: Int -> SL ()
showLine nr =
do s <- getState
   liftIO $ putStrLn $ s !! nr
   promptLoop

promptStr = "lFilename [load the file Filename] - sNr [show the line Nr of 
Filename] - q to quit"

promptLoop :: SL ()
promptLoop = 
do liftIO $ putStrLn promptStr
   str <- liftIO getLine
   case str of
 ('l':ss) -> getFile ss
 ('s':nr) -> showLine (read nr)
 ('q':[]) -> liftIO $ return ()
 _ -> promptLoop

main =
evalStateT promptLoop $ Mystate []
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-21 Thread Andrea Rossato
On Sat, Oct 21, 2006 at 09:09:13PM +0200, Andrea Rossato wrote:
> So, it's parsec, on your side...;-)

I sorry, I was a bit confused when I wrote that. 
I confused you for another person, obviously. 
Sorry about that. Andrea


pgpx3F8PsFdVW.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-21 Thread Andrea Rossato
Hallo!

Thanks a lot for stopping by.

On Sat, Oct 21, 2006 at 06:41:32PM +0200, Udo Stenzel wrote:
> The correct solution however, is the application of 'seq' at the right
> places.  To understand where these are, perform a simulation of
> Haskell's reduction strategy on paper.

I will definitely try that.

> > 30 Mega used for reading a feed is a number that I seem to get. 
> 
> Depends on what you're doing with the data.  If you scan a stream of
> Chars exactly once, the space requirement per Char is next to
> irrelevant.  If you're keeping lots of Strings around, using
> PackedStrings will help (and be sure to pack strictly).  But I actually
> suspect, you are running a backtracking parser over your input, so the
> whole input is read into a String and cannot be disposed of as long as
> the parser might backtrack.  If this is Parsec, you need to remove a
> redundant 'try'.  If it is the Read class, you need to replace it by
> Parsec or ReadP...

No. I'm actually using the ReadDocument module of HXT for reading my
input and writeDocument for writing. So, it's parsec, on your
side...;-)

The application I'm writing is basically a HXT application, with an
hscurser gui.

If you have some hints on how to make memory consumption go down that
would be great.

Thanks for your kind attention.
Andrea


pgp6mLPTDHpCe.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-21 Thread Udo Stenzel
Andrea Rossato wrote:
> I did not get an appreciable improvement with performGC, as you can
> see from here:
> http://gorgias.mine.nu/haskell/a.out.withPerformGC.ps
> 
> But I found a solution: just write the opml state component to a file!

Obviously the values in question were not garbage, rather these were
unevaluated thunks.  Writing the data causes it to be evaluated, the
thunks become garbage and get collected.

The correct solution however, is the application of 'seq' at the right
places.  To understand where these are, perform a simulation of
Haskell's reduction strategy on paper.

 
> > second, each Char in ghc occupies 12 bytes (!)
> > 
> > multiplying this at 2.5 or even 3 factor which i described in previous
> > letter means, say, 30 mb used
> 
> 30 Mega used for reading a feed is a number that I seem to get. 

Depends on what you're doing with the data.  If you scan a stream of
Chars exactly once, the space requirement per Char is next to
irrelevant.  If you're keeping lots of Strings around, using
PackedStrings will help (and be sure to pack strictly).  But I actually
suspect, you are running a backtracking parser over your input, so the
whole input is read into a String and cannot be disposed of as long as
the parser might backtrack.  If this is Parsec, you need to remove a
redundant 'try'.  If it is the Read class, you need to replace it by
Parsec or ReadP...


Udo.
-- 
 Ich glaub vorher defragmentier ich meine Festplatte, schmeiß
alle CDs weg und installier Löwenzahn, Teletubbies, Pokemon usw. auf
meinem Rechner. Dann lauf ich amok. Das wird den Psychologen EINIGES
zu denken geben.


signature.asc
Description: Digital signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-21 Thread Andrea Rossato
Hallo Bulat!

On Fri, Oct 20, 2006 at 10:21:51PM +0400, Bulat Ziganshin wrote:
> first, GC don't occurs automatically when you close file. you can help
> GHC by using performGC from System.Mem. i does it in my own prog

I did not get an appreciable improvement with performGC, as you can
see from here:
http://gorgias.mine.nu/haskell/a.out.withPerformGC.ps

But I found a solution: just write the opml state component to a file!

At first I decided to write it and reload it:
http://gorgias.mine.nu/haskell/a.out.withFileReload.ps

But I get a better result by just writing it (to /dev/null or to a
real file): 
http://gorgias.mine.nu/haskell/a.out.withFileWrite.ps

This way everything is garbage collected and the graph is what I would
expect.

Now, this is a good solution since the opml file is generally small (I
have a file with 100 subscribed feeds and it is just 46Kbyte) and it
stores information on the folder layout, so it must be saved anyway.


> second, each Char in ghc occupies 12 bytes (!), so each of your files
> occupies about 5 mb of memory. if you will count the previous problem,
> the 2 or 3 files can be hels in memory at the same time (just because
> they was not yet GCd) so memory usage may become, say, 10 mb
> 
> multiplying this at 2.5 or even 3 factor which i described in previous
> letter means, say, 30 mb used

30 Mega used for reading a feed is a number that I seem to get. 

I also prevent the user from opening more then one feed at once. When
a new feed is opened, the old feed gets removed, the opml file is
written, the old stuff is garbage collected and the new stuff is
added. Now, the memory fingerprint of the program is given by the most
memory consuming feed. It is running at about 60-65 Mega. Not that
bad, compared to the previous situation.

Thank you very much for your suggestions.
Regards
Andrea
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-20 Thread Andrea Rossato
On Thu, Oct 19, 2006 at 06:23:35PM +0400, Bulat Ziganshin wrote:
> Hello Andrea,
> 
> Wednesday, October 18, 2006, 9:34:28 PM, you wrote:
> 
> > solution? Or just some hints on the kind of problem I'm facing: is it
> > related to strictness/laziness, or it's just that I did not understand
> > a single bit of how garbage collection works in Haskell?
> 
> i think, the second. unfortunately, i don't know good introduction to
> the actual GC implementation although i can give you a pair of
> not-so-friendly references:
[...[
> 
> shortly speaking, memory allocated by GHC never shrinks. 


Hello Bulat,
well, you gave me a wonderfully clear introduction to Haskell GC, and
now I have a better understanding of the output of the various
profiling I'm doing. Thank you very much!

Still, I cannot understand my specific problem, that is to say, why
the function that reads a file retains so much memory.

I did some test and the results are puzzling:
- I tried reading the feed and directly converting it into the opml
chunk to be inserted into the opml component of my StateT monad. The
problem becomes far worse. Here the output of a heap profile:
http://gorgias.mine.nu/haskell/a.out.feed2opml.ps
as you can see, after opening one feed (397868 bytes), closing it, opening 
another
one (410052 bytes), closing it and reopening the first one brings
memory consumption to 152 Mega.

Using the intermediate datatype (that is to say, reading the feed,
transforming it into my datatype and then to the opml tree), reduces
the problem: 
http://gorgias.mine.nu/haskell/a.out.feed2feedNotStrict.ps
only 92 Mega of memory consumption for the very same operations.

Making the intermediate datatype strict gives almost the same results:
http://gorgias.mine.nu/haskell/a.out.feed2feedStrict.ps
98 Mega.

Now, I come to believe the file reading is indeed strict, and that
my problem could be related to StateT laziness.

Does this makes sense?

I'm now going to try to implement my opml state as a IORef and use a
ReaderT monad to see if something new happens.

> ps: if your program uses a lot if string, FPS will be a very great. it
> don;t change the GC behavior, just makes everything 10 times smaller
> :)

yes, but I'm using HXT and this is using normal strings to store xml
text nodes. So I could have some improvements with IO but not that much
in memory consumption, unless I totally change my implementation.

Anyway, even if I could reduce from 152 to 15 mega the memory
consumption for reading 2 feeds, I'd be running out of memory, on my
laptop, in one day instead that 5 minutes. Anyway I should face the
fact that it is not the string implementation in Haskell that is
causing the problem. The problem is probably me!

Thanks for your kind attention.
Regards
Andrea
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-19 Thread Bulat Ziganshin
Hello Andrea,

Wednesday, October 18, 2006, 9:34:28 PM, you wrote:

> solution? Or just some hints on the kind of problem I'm facing: is it
> related to strictness/laziness, or it's just that I did not understand
> a single bit of how garbage collection works in Haskell?

i think, the second. unfortunately, i don't know good introduction to
the actual GC implementation although i can give you a pair of
not-so-friendly references:

Generational garbage collection for Haskell
  http://research.microsoft.com/~simonpj/Papers/gen-gc-for-haskell.ps.gz
GHC Commentary about GC
  http://hackage.haskell.org/trac/ghc/wiki/GarbageCollectorNotes
  http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage

shortly speaking, memory allocated by GHC never shrinks. GHC by
default use 2-stage memory allocator. First, memory for new objects
are allocated in 256k chunks ("nursery", whose size controlled by RTS -A
option). when these 256 kb is exhausted, "minor GC" occurs. minor GC
scans nursery and moves to the global heap all objects that are still
alive. btw, it's great idea because minor GC runs very fast and most
objects are died at this stage and don't participate in slow major GCs

when all the memory, currently allocated for the heap, exhausted -
major GC occurs. it scans all objects in the heap and copy alive ones
to the new memory blocks that are allocated during this process. so,
for example, if just before major GC we have 30 mb heap where only 10
mb alive, then during GC we will alloc 10 mb more, copy alive obejcts
there and at the end will free the original 30 megs

but these 30 megs don't returned to the OS! instead, they becomes free
part of the heap that then filled with new objects. so, after this
major GC we have 40 megs heap of which 30 megs are free. next major GC
will occur when these 30 megs will be filled by objects that survived
after minor GCs

well, it's default scenario. with compacting GC or with -H, -F, -G the
scenario will slightly change. you can look at behavior of memory
manager using +RTS -Sfile option. here is example:

AllocCollectLiveGCGC TOT TOT  Page Flts
bytes bytes bytes  user  elapuserelap
...
   262116266240 119095740  0.00  0.00   23.72   47.3300  (Gen:  0)
   262116266240 119269488  0.01  0.01   23.73   47.3400  (Gen:  0)
   262116266240 119443236  0.00  0.00   23.73   47.3400  (Gen:  0)
   262116266240 119616988  0.01  0.01   23.74   47.3500  (Gen:  0)
   262116 119365632  81473776  1.54  1.66   25.29   49.0100  (Gen:  1)
   262116266240  81647512  0.00  0.00   25.29   49.0100  (Gen:  0)
   262116266240  81821260  0.00  0.00   25.29   49.0100  (Gen:  0)
...

each line here reports stats of one GC. '1' in last column means major
GC, other are minor ones. third column displays current heap size. as
you can see, after each minor GC heap size is increased at 100-200 kb.
this means that from 256kb allocated only 100-200k was survived to be
moved in the main heap. Heap size was 120 mb and when heap was filled,
major GC occurs. after it ends, only 81 mb of data survived, the heap
was extended to 120+80=200 megs and next minor GCs continues to fill
it up. so, each major GC extends heap (in absence of additional
RTS options) and nothing between major GCs can do it. because it was
last GC in this run, its stats defined stats of entire program run:

989,191,980 bytes allocated in the heap
467,301,272 bytes copied during GC
 81,473,776 bytes maximum residency (9 sample(s))

   3584 collections in generation 0 (  6.33s)
  9 collections in generation 1 (  3.23s)

194 Mb total memory in use

here GHC reports that maximum amount of really used memory was 81 megs
(GHC can determine real memory usage only at major GCs), while memory
allocated by RTS was 200 megs. so signal/noise ratio is 40% here.
sorry, but it is rather typical. there are various techniques to fight
with it but in general any memory allocation technique involves a lot
of vanished memory. at least GHC GC is enough efficient in terms of time:

  MUT   time   21.73s  ( 45.57s elapsed)
  GCtime9.56s  (  9.94s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   31.31s  ( 55.51s elapsed)

  %GC time  30.6%  (17.9% elapsed)

ps: if your program uses a lot if string, FPS will be a very great. it
don;t change the GC behavior, just makes everything 10 times smaller
:)
  


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] memory, garbage collection and other newbie's issues

2006-10-18 Thread Jason Dagit

On 10/18/06, Andrea Rossato <[EMAIL PROTECTED]> wrote:

Hi!

I'm a newbie and, as a learning experience, I'm writing a feed reader
with hscurses and hxt. For the present time the feed reader just reads
a Liferea cache but, as you can imagine, I'm running into the usual
newbie problems of memory consumption and garbage collection, probably
(I'm not sure) related to strictness/laziness.

Even though I spent a couple of hours search the mailing list
archives, I did not come up with something I can relate to, so I'll
try to explain my problem.

The feed reader, that should be compatible with Liferea, takes an opml
(1.0) file, that stores information on folders and subscribed feeds.
It uses it as the major component of the state of a ST monad, after
adding some attributes used by the reader UI.

The UI, that uses the widget library of hscurses and is derived from
the Contact Manager example, will just display this opml file, and
every UI event (collapsing/expanding of folders, displaying feeds,
tagging, flagging, and so on) is just an XML transformation of this
opml state component.

So, when the feed reader boots, only the layout of folders and
subscribed feeds is presented to the user.

When the user selects a feed do be displayed, the cached file
containing up to 100 saved posts, is read and transformed into a data
type (called Feed, obviously). After that this data type is
transformed into an opml (xml) tree, that is inserted as a child in
the appropriate place of the opml state component.

Moreover the parent element of the opml state component (which holds
the original information of the subscribed feed) is edited for adding
general feed information (such as last update, feed's attributes, and
so on) retrieved by reading the file.

When the user collapses the feed, the added opml chunk is deleted from
the state component (but not the added information to the parent of
this chunk).

Now, I would expect that after the opml chunk is deleted all the
memory allocated for reading the cached file would be garbage
collected. This is not happening, so, every time you open (or reopen)
a feed, the used memory of the feed reader increases, and never
decreases.

After profiling I've seen that the problem is occurring in the
function that reads the cached file:

loadFeed :: String -> IO [Feed]
readFeed id =
do [a] <- runX $ readDocument [(a_validate, v_0)] (cachePath ++ id)
   return $ runLA toFeed a

What this function does is reading the file with:
h <- openFile ...
hGetContents h
and applying some XML filters to get the Feed type populated with the
needed information.

I tried making the function strict with $!. I tried using fps. It
doesn't change this behaviour, obviously.


Have you tried adding strictness annotations to your data type?  For
example, something like this:

data Foo a = Foo !a

If you do this for the datatypes you want to get deleted, my
understanding is that it will help.  I think this helps because the
garbage collector will have one less excuse for not cleaning up the
values since they will be forced to exist completely when they are
created instead of only existing partially (in which case, if I
understand correctly, the garbage collector doesn't throw away
partially constructed values).

You said you did profiling, have you done retainer profiling?  I
haven't used it myself, but I think that it is designed to help you
identify where the memory is being leaked.

I've been using haskell on and off for over a year now but still
consider myself to be mostly a newbie as well.  I started using
haskell at work recently on a project and I've found that although
haskell makes it so that I don't spend much time debugging (thanks
referential transparency!), testing (thanks quickcheck!), or writing
code (thanks haskell in general!), I do spend a lot of time profiling
and optimizing for time/space.  I think it's interesting that the
development is mostly quick and easy but polishing and making it ready
for general use can still be hard because of performance issues.
Although, I'd rather spend my time optimizing something that works
than debugging pointer problems.

HTH,
Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe