Re: Performance, and algorithms

Manuel M. T. Chakravarty Sun, 30 Apr 2000 07:11:15 -0700
Ian Jackson <[EMAIL PROTECTED]> wrote,

> I'm probably being dense here and not seeing the wood for the trees,
> and I've included lots of tedious junk below, but here goes.
>
> I have a program which runs too slowly.  I knew when I wrote it that I
> was taking a `naive' approach to the program - I've tried to write
> what I meant rather than what I thought might be fast; if it turned
> out to run too slowly I was planning to optimise it then.  So here I
> am :-).

IMHO, in any serious application (where resource consumption
is not a complete non-issue) developed in a super-high-level
language like Haskell, you will spend some of the time that
the high-levelness saved you in developing the application
for performance debugging your code.[1]

> I've been using GHC's profiling support.  This gives me information
> about numbers of `allocations' (which I presume refers to memory
> allocations) and does indeed point the finger at the part of the
> program that I most suspected anyway.
> 
> I think the program is using too much CPU rather than too much memory
> because (a) my machine doesn't thrash and (b) the test case I'm using
> is not very large.

Memory and CPU usage is more tightly coupled in a functional 
program (especially, a lazy functional one) than in more
conventional programming languages.  This is for a number of 
reasons:

* In lazy languages, memory is allocated very rapidly and
  much of it is short-lived.  As a result those parts of the
  program that allocated most memory usually also are
  executed most often.

* If there is a space leak (ie, large amounts of data
  structures are referenced - and thus not garbage collected 
  - although they won't be used), garbage collection time
  can be substantial.  Run your program with +RTS -S to get
  the GC statistics (the statistics are written to
  <prgm>.stat if your executable is <prgm>).

The fact that your machine doesn't trash does not need to
mean anything.  GHC has an upper bound on the memory it is
prepared to grab from the OS (by default 64MB, I think).
After that the code might just spend most of its time doing
GCs in that fixed area.

> However, my GHC installation is a bit `dodgy' and a bit stale too, and
> the profiled executables suffer from (a) lack of any CPU time
> indication in the profiling reports and (b) occasionally coredumping
> unexpectedly.  Rest assured that I plan to update my GHC installation
> and recheck all my settings, and will report the problem to the GHC
> people if it still doesn't work properly - I'm just giving this info
> as background detail, not as a problem report.

In fact, according to GHC HQ, there is much better profiling
infrastructure in the next release (and currently in CVS).

> I have a number of not really very closely related questions:
> 
> * I find it difficult to understand how the code I write translates
> into actual algorithms, memory management, etc.  Haskell is such a
> nice language that it hides all this detail from me :-).  So, I'd be
> grateful for a reference or two on this area.

You might like to check out

  Implementing lazy functional languages on stock hardware:
  the Spineless Tagless G-machine, SL Peyton Jones, Journal
  of Functional Programming 2(2), Apr 1992, pp127-202.

which you can get from

  http://research.microsoft.com/Users/simonpj/Papers/papers.html#compiler

[Actually, the link on that page doesn't work for me - some
ASP lossage, I suspect, but maybe you are more lucky.]

> * I added a reasonable amount of added strictness in `obvious' kind of
> places, but with no visible effect on performance.  Does adding
> strictness help with CPU use as well as with memory use ?  Where is it
> most beneficial to add strictness ?

Strictness can help with CPU use, but whether strictness is
really what you need and, if so, where, depends very much on 
the real cause of your problem.

> * Most of my program is in a state-threading monad of my own (built
> out of normal functional pieces).  The main program (which is in the
> IO monad) invokes the `grind handle on state machine' function,
> passing the action to perform and the initial state and getting the
> new state back.  Is this good, bad, or ugly ?  

Sounds fine to me.

> It does make it easy to
> decouple the nasty I/O handling event loop part of the program from
> the clean, supposedly predictable, state engine.  (Yes, the program
> does really need to be a state engine.)

This definitely is a good idea.

One possible cause of a space leak might be that you somehow
retain a reference to ``old'' states even after they are
(algorithmically) no longer needed.

> * In particular, the bit of code that seems to be slow is the part
> responsible for updating a part of the state whose type is broadly
> speaking
> 
>    Array (x,y) [value]
> 
> I need to do similar cumulative updates to the values in a whole
> region of the array, and I do it with (basically) this
> 
>  -- This applies an update to all cells within range of the region:
> 
>  regionallyUpdateArenaArray ::
>      Region -> Coord -> (Distance -> v -> v) -> ArenaArray v
>      -> ArenaArray v
> 
>  regionallyUpdateArenaArray
>            region@((xmin, ymin), (xmax,ymax))
>            range uf old
>      = accum (flip uf) old loc_dists
>        where loc_dists = [ (loc, dist) |
>                          x <- [ max 0 (xmin-range) .. min (xsz-1) (xmax+range) ],
>                          y <- [ max 0 (ymin-range) .. min (ysz-1) (ymax+range) ],
>                          let loc = (x,y),
>                          let dist = region |-| locationToRegion loc,
>                          dist <= maxdist
>                        ]
>            maxdist = toDistance range
> 
>  The |-| operator calculates the distance (actually, the squared
>  distance embedded in a newtype) between two regions.
>
> In C I would have just had one big flat array and updated it in
> place.  How can I get a Haskell compiler to do the same ?
> 
> * I really wanted a simple, flat, strict type rather than the lazy,
> nested [value].  I'm only using a list because I want to do a zipWith
> on it in the update function.  Also, the manual says that an Array is
> not strict in its contents, which doesn't seem helpful to me.

Hmm, that looks quite messy.  Do I understand you correctly?
You rather would like to avoid the two-level hierarchical
structure that you have now and you are using it only to
improve the efficiency of the update?

Have you thought about using a (balanced) tree-based finite
map instead?  Whether that makes sense depends on the size
of your structure and how many entries you expect to update.
I have an implementation of finite maps here:

  http://www.cse.unsw.edu.au/~chak/haskell/code/FiniteMaps.hs

And GHC also includes one.

It might be that `regionallyUpdateArenaArray' hangs on to
old versions of the updated array, gradually filling up
memory.

Manuel

[1] This is a Good Thing, because, I think, you will
    still save time overall and performance debugging is
    easier than digging though core dumps.  Especially, in
    OSS development this should be an advantage, because the
    time needed to get your app to a level where you can
    make the first developer release is shorter.
Re: Performance, and algorithms

Reply via email to