Re: [Haskell-cafe] idea for avoiding temporaries

Jan-Willem Maessen Fri, 09 Mar 2007 07:58:53 -0800


On Mar 8, 2007, at 12:00 PM, David Roundy wrote:

Hi all,

I was just teaching a class on minimization methods, with a focus on
conjugate gradient minimization in particular, and one of my mainpointswas that with conjugate gradient minimization we only need three orfour
arrays of size N (depending on whether we use the Fletcher-Reeves or
Polak-Ribiere variant), ... This got me thinking about one of thelargest problemswith writing serious numerical code in Haskell, which is that ofmemory
consumption and avoiding temporary variables.

I've been following this discussion with interest, as I've beenlooking in some detail at conjugate gradient algorithms as part of myday job, and I've spent a good deal of time thinking about exactlythe problems you raise. For those following along at home, here's asample somewhat-imperative CG algorithm (this is the simplifiedstripped-down version):


    for j <- seq(1#cgit) do
      q = A p
      alpha = rho / (p DOT q)
      z += alpha p
      rho0 = rho
      r -= alpha q
      rho := r DOT r
      beta = rho / rho0
      p := r + beta p
    end

Here p,q,r, and z are vectors, A is the derivative of our function(in this case a sparse symmetric positive-definite matrix, but we canreally think of it as a higher-order function of type Vector->Vector)and the greek letters are scalars. The "answer" is z. In practicewe'd not run a fixed number of iterations, but instead do aconvergence test. All the hard work is in the line "q = A p", butthe storage consumption is mostly in the surrounding code. On aparallel machine (where these sorts of programs are often run) thispart of the algorithm has almost no parallelism---all those dotproducts and normalizations preclude it---but the A p step and thedot products themselves are of course quite parallelizable.

Sadly, many of the suggestions, while generally sound, just don'tapply well to this particular use case.

* As other have noted, burying the updatable state in a monad justbegs the question. The resulting code looks nothing at all like themathematics, either, which is a big problem if you're trying tounderstand and maintain it (The above code is virtually isomorphic tothe problem specification). I'm sure David is seeking a more-declarative version of the code than the spec from which we wereworking. Note that rather than embedding all the state in a specialmonad, we might as well be using update-in-place techniques (such asthe += and -= operations you see above) with the monads we've alreadygot; the result will at least be readable, but it will be tooimperative.

* There are opportunities for loop fusion in CG algorithms---we cancompute multiple dot products on the same array in a single loop---but these have the flavor of fusing multiple consumers of a datastructure into a single consumer, which I believe is still anunsolved problem in equational frameworks like foldr/build orstreams. It's a bit like fusing:


   n = foldl' (+) 0 . map (const 1) $ xs
   sum_xs = foldl' (+) 0 $ xs
   sum_sq = foldl' (+) 0 . map (\x->x*x) $ xs

into:

   (n,sum_xs,sum_sq) =
      foldl' (\(a0,b0,c0) (an,bn,cn)->(a0+an, b0+bn, c0+cn)) (0,0,0) .
      map (\x->(const 1 x, x, x*x)) $ xs

which we understand how to do, but not equationally (or at least wedidn't last I looked, though Andy Gill and I had both fantasizedabout possibilities for doing so).

None of these fusion opportunities actually save space, which makesthe problem tricker still.

* The algorithm as written already tries to express minimal storage.The only question is, do +=, -=, and := update their left-hand sidein place, or do we think of them (in the Haskell model of theuniverse) as fresh arrays, and the previous values as newly-createdgarbage? My challenge to fellow Haskell hackers: find a way toexpress this such that it doesn't look so imperative.

* Even if we do a good job with += and -=, which is what David seemsto be looking for, we still have those irritating := assignments---we'd like to throw out the old p and reuse the space in the lastline. And we'd like to have one piece of storage to hold the q oneach successive iteration. So even if we solve David's problem, westill haven't matched the space requirements of the imperative code.

* DiffArrays are too expensive to be acceptable here. It's not evena question of unboxing. Let's say we keep the current array in fast,unboxed storage; this lets us read its elements using a single load.Each update still needs to retrieve the old data from the array andadd it to the old-versions lookup table; together these operationsare much more expensive than the actual update to the current versionof the table. And we need to do this even though no old versionsexist! We should be able to avoid this work entirely. (And, if oldversions do exist, a DiffArray is the pessimal representation forthem given that we're updating the whole array).

* Linear or Uniqueness types are almost what we want. I think JosefSvenningson was the one who captured this the best: Uniqueness type*require* that the *caller* of these routines make sure that it isnot sharing the data. So we need two copies of our linear algebralibrary---one which takes unique arrays as arguments and updates inplace, and one which takes non-unique arrays and allocates. And wehave to pick the right one based on context. What we want, it seemsto me, is one library and a compiler which can make informed judgments.

* We could imagine tracking uniqueness dynamically at run time, usingsomething like reference counting for all our arrays. But we need todo the reference counting precisely---this is pretty much the mostinefficient way possible of tracking the storage, and doesn't playwell at all with using efficient GC elsewhere in our programs. Theinefficiency might be worth it for arrays, but Haskell is polymorphicand in many cases we need to treat all our data the same way.

* Finally, I'll observe that we often want to use slightly differentalgorithms depending upon whether we're updating in place orcomputing into fresh storage. Often copying the data and thenupdating it in place is not actually a good idea.

I'd love to hear if anyone has insights / pointers to related work onany of the issues above; I'm especially keen to learn if there's workI didn't know about on fusion of multiple traversals. In my day jobwith Fortress we are looking at RULES-like approaches, but theyfounder quickly because the kind of problems David is trying to solveare 90% of what our programmers want to do.


-Jan-Willem Maessen

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] idea for avoiding temporaries

Reply via email to